Friday, November 14, 2008

Performance of Collections in .NET

With some future prospects of writing a high scale LOB Silverlight application, I thought performance of the application would be crucial. I decided to look into writing optimized code and before that I decided to find out what is the performance impact of the collections that I use most commonly. To be specific, in this post, I only look at the performance of creating various collections of type int.

Large Collections

I wrote a simple performance test class which returns me collections of various types. Each method create a list of pretty large size.
public static int MAX = 19999999;
The results of the test program is shown below.

image

Name

Collection Type Explained

Ar Array (int[] array)
Ar_Enble array.AsEnumerable()
Ar_En_Cast IEnumerable<int> e = array;
List List<int>
IniList List<int> l = new List<int>(MAX);
ArList ArrayList
Range new List<int>(GetArray());
LL LinkedList<int>


From the results, it is obvious that an array is the fastest collection that you can use and on an average it is 28 times faster than ArrayList and 3 times faster than the List<T>.

Note that for this discussion, I only am using entries with green color. This is based on the assumption that GetArray() should be faster than first getting the array and then IEnumerable out of the array. Some of the key points I made out of this experiment:

  • The performance of an initialized list (IniList) is 1.3 slower than using plain array.
  • AsEnumerable() is slightly faster than the cast operation.
  • Do not use LinkedList, it is dead slow(50-70 times slower).
  • Do not use ArrayList, it is dead slow(28 times slower).
  • AddRange() or List created by passing an entire array is not faster than adding individual items to an capacity-set list, which is at least 4 times faster.
  • Use arrays when you can guess the size of the list.
  • Use initialized list, if you are not sure about the size and use TrimExcess() if memory is of your concern.

I would further like to study the List<T> class in detail since it is one collection that I use the most. In the mean time, if you are further interested in application performance, then look at the link http://msdn.microsoft.com/en-us/library/ms998574.aspx#scalenetchapt13_topic5

Update

I just came across the following points in MSDN.

How to choose between arrays and collections

Arrays are the fastest of all collection types, so unless you need special functionalities like dynamic extension of the collection, sorting, and searching, you should use arrays. If you need a collection type, choose the most appropriate type based on your functionality requirements to avoid performance penalties.

  • Use ArrayList to store custom object types and particularly when the data changes frequently and you perform frequent insert and delete operations. Avoid using ArrayList for storing strings.
  • Use a StringCollection to store strings.
  • Use a Hashtable to store a large number of records and to store data that may or may not change frequently. Use Hashtable for frequently queried data such as product catalogs where a product ID is the key.
  • Use a HybridDictionary to store frequently queried data when you expect the number of records to be low most of the time with occasional increases in size.
  • Use a ListDictionary to store small amounts of data (fewer than 10 items).
  • Use a NameValueCollection to store strings of key-value pairs in a presorted order. Use this type for data that changes frequently where you need to insert and delete items regularly and where you need to cache items for fast retrieval.
  • Use a Queue when you need to access data sequentially (first in is first out) based on priority.
  • Use a Stack in scenarios where you need to process items in a last–in, first-out manner.
  • Use a SortedList for fast object retrieval using an index or key. However, avoid using a SortedList for large data changes because the cost of inserting the large amount of data is high. For large data changes, use an ArrayList and then sort it by calling the Sort method.

The link : http://msdn.microsoft.com/en-us/library/ms998530.aspx

Why do I find Hashtable slow?

I re-conducted my experiment using Hashtable as a list (key and value) would be the same! It is dead slow, I repeat very-very slow - 1000 times slower than using an array and 80 times slower than the slowest list (un-initialized, added items witout using AddRange). But the guidelines states that Hashtable should be used to store large number of records. Is it really?

Relatively HashSet<int> was much faster. It was atleast 10 times faster to say the least. Note that you cannot use HashSet<int> instead of List<int> (since List<int> is obviously faster) because HashSet does not allow duplicate elements and is designed for high performance set operations. May not be a great test that one should really trust, but I feel simple collections are what used most of the times, so next time I would retest using objects instead of data types. My test results.

image
Ht - Hashtable and HS - HashSet<int>

From now on, I make it a point to use initialized List<int> since its performance is close to int[] and yet it is dynamic in nature. I hope not to use HashSet unless I have extremely complicated Set operations. I would use HashSet if (time taken to create HashSet + time taken to perform set operations) is less than (time taken to create a List + time taken to perform set operations using LINQ).

Performance is an interesting topic and the more you dig into it, the more knowledgeable you become about a system.

No comments: