Arrays Lists and some IEnumerable Philosophy

Wednesday, September 16, 2009 9:36 PM

So I had an "architecture moment" today at work. I was refactoring some code with Jay Smith and while modifying a few classes, we changed a few data types from List<T> to T[] (from generic lists to plain old arrays) at my request. I'll get to why in a moment. Just let it be noted first that doing this caused two things: first it caused a significant amount of work at merge time for one of my teammates, and then it caused a philosophy discussion about why we should or should not bother with plain arrays instead of their big brother List class. And then the conversation veered in the direction of using IEnumerable<T> as return types and the dangers therein.

Inflicted Pain

When we made all the changes, I thought we were doing all the grunt work for everyone else on the team. I hate when what I do causes a significant amount of work for someone else; especially when the change isn't "needed" but is more of a philosophy type change. It turned out that one of the classes heavily affected by our changes was being worked on heavily by someone else. Thus the heavy merge problems for him. Completely my bad.

Array vs List

I have a very strong belief that all code should be self documenting. Therefore, to me, if I make the data type of a property List<whatever>, I feel that is telling other developers that this is a property expecting to be modified (adding and removing things). An array, however, says to the world that here is the final un-modifiable list as it was at object creation. A side result of this belief is that I pretty much never return a List from a method call since there are very few contexts where that makes sense.

Well, there usually is a reason to use List as a data type or return type. Not a good reason, but a reason... I usually see this as a convenience because those lists do get modified by adding dummy objects (like the first item in a dropdown that says "Please Select One") before data binding.

There was one other point mentioned too - performance. Yes, I know, I know. A List is bigger than an array and calling ToList on an IEnumerable is slower than calling ToArray, but not by much. And when I say "not by much", I mean seriously, seriously small amounts. I actually did a bit of testing to be sure and this is the result I got, so this part of the argument isn't worth ever mentioning again.

ToList speed (10,000 calls): 2,089 ms
ToArray speed (10,000 calls): 1,986 ms

ToList size: 16,448 bytes
ToArray size: 16,024 bytes

Total guids created: 20,002,000

The code that generated this result is at the bottom of the post.

IEnumerable Gotcha

When I mentioned that I never return List<T> from a method and always choose T[], it was asked why I don't just return IEnumerable<T> instead of an array.

Do you happen to remember the phrase deferred execution? It basically means that any variable of type IEnumerable<T> might be a list of stuff and it might just be an execution plan that will return a list of stuff as the list is enumerated.

The gotcha in that last statement is that the execution plan will run every time the list is enumerated. Did you notice the last line of the above test result? I did that on purpose to show again that an IEnumerable<T> will execute over and over. This is why even though the size of the list of guids is 1000, the GetNewGuid() method was called over 20 million times during the test!

So, you certainly can be careful when returning IEnumerable by calling ToArray() or ToList() to force the execution plan and return a finalized list. The problem for my little brain is that I constantly forget to do that! I'll run the application or the unit test, see whacky things happen or poor performance, and then go back and fix things properly. Or at least I used to when I was on the kick where every list was an IEnumerable no matter what. Now by default I use an array and only return IEnumerable when deferred execution is my intention (remember all code is self documenting).

Enough already!

Ya, ya... that's enough rambling. I really do hate long winded blog posts! Here's the code I mentioned earlier that generated that result. Happy coding!

internal class Program
{
   private static int _numGuids;

   private static void Main()
   {
      IEnumerable<Guid> guids = Enumerable
         .Range(0, 1000)
         .Select(x => GetNewGuid());

      MeasureSpeed(guids);
      MeasureSize(guids);

      Console.WriteLine("Total guids created: {0:n0}", _numGuids);
   }

   private static void MeasureSpeed(IEnumerable<Guid> guids)
   {
      const int iterations = 10000;

      var stopwatch = new Stopwatch();
      stopwatch.Start();
      for (int i = 0; i < iterations; i++) guids.ToList();
      stopwatch.Stop();
      Console.WriteLine("ToList speed ({0:n0} calls): {1:n0} ms", iterations, stopwatch.ElapsedMilliseconds);

      stopwatch.Reset();
      stopwatch.Start();
      for (int i = 0; i < iterations; i++) guids.ToArray();
      stopwatch.Stop();
      Console.WriteLine("ToArray speed ({0:n0} calls): {1:n0} ms", iterations, stopwatch.ElapsedMilliseconds);

      Console.WriteLine();
   }

   private static void MeasureSize(IEnumerable<Guid> guids)
   {
      var startingMemory = GC.GetTotalMemory(true);

      var list = guids.ToList();
      var memoryAfterList = GC.GetTotalMemory(true);
      Console.WriteLine("ToList size: {0:n0} bytes", memoryAfterList - startingMemory);

      var array = guids.ToArray();
      var memoryAfterArray = GC.GetTotalMemory(true);
      Console.WriteLine("ToArray size: {0:n0} bytes", memoryAfterArray - memoryAfterList);

      Console.WriteLine();
   }

   private static Guid GetNewGuid()
   {
      _numGuids++;
      return Guid.NewGuid();
   }
}
Tags: .net, architecture, linq