Arrays Lists and some IEnumerable Philosophy

by September 16, 2009 04:36 PM

So I had an "architecture moment" today at work. I was refactoring some code with Jay Smith and while modifying a few classes, we changed a few data types from List<T> to T[] (from generic lists to plain old arrays) at my request. I'll get to why in a moment. Just let it be noted first that doing this caused two things: first it caused a significant amount of work at merge time for one of my teammates, and then it caused a philosophy discussion about why we should or should not bother with plain arrays instead of their big brother List class. And then the conversation veered in the direction of using IEnumerable<T> as return types and the dangers therein.

Inflicted Pain

When we made all the changes, I thought we were doing all the grunt work for everyone else on the team. I hate when what I do causes a significant amount of work for someone else; especially when the change isn't "needed" but is more of a philosophy type change. It turned out that one of the classes heavily affected by our changes was being worked on heavily by someone else. Thus the heavy merge problems for him. Completely my bad.

Array vs List

I have a very strong belief that all code should be self documenting. Therefore, to me, if I make the data type of a property List<whatever>, I feel that is telling other developers that this is a property expecting to be modified (adding and removing things). An array, however, says to the world that here is the final un-modifiable list as it was at object creation. A side result of this belief is that I pretty much never return a List from a method call since there are very few contexts where that makes sense.

Well, there usually is a reason to use List as a data type or return type. Not a good reason, but a reason... I usually see this as a convenience because those lists do get modified by adding dummy objects (like the first item in a dropdown that says "Please Select One") before data binding.

There was one other point mentioned too - performance. Yes, I know, I know. A List is bigger than an array and calling ToList on an IEnumerable is slower than calling ToArray, but not by much. And when I say "not by much", I mean seriously, seriously small amounts. I actually did a bit of testing to be sure and this is the result I got, so this part of the argument isn't worth ever mentioning again.

ToList speed (10,000 calls): 2,089 ms
ToArray speed (10,000 calls): 1,986 ms

ToList size: 16,448 bytes
ToArray size: 16,024 bytes

Total guids created: 20,002,000

The code that generated this result is at the bottom of the post.

IEnumerable Gotcha

When I mentioned that I never return List<T> from a method and always choose T[], it was asked why I don't just return IEnumerable<T> instead of an array.

Do you happen to remember the phrase deferred execution? It basically means that any variable of type IEnumerable<T> might be a list of stuff and it might just be an execution plan that will return a list of stuff as the list is enumerated.

The gotcha in that last statement is that the execution plan will run every time the list is enumerated. Did you notice the last line of the above test result? I did that on purpose to show again that an IEnumerable<T> will execute over and over. This is why even though the size of the list of guids is 1000, the GetNewGuid() method was called over 20 million times during the test!

So, you certainly can be careful when returning IEnumerable by calling ToArray() or ToList() to force the execution plan and return a finalized list. The problem for my little brain is that I constantly forget to do that! I'll run the application or the unit test, see whacky things happen or poor performance, and then go back and fix things properly. Or at least I used to when I was on the kick where every list was an IEnumerable no matter what. Now by default I use an array and only return IEnumerable when deferred execution is my intention (remember all code is self documenting).

Enough already!

Ya, ya... that's enough rambling. I really do hate long winded blog posts! Here's the code I mentioned earlier that generated that result. Happy coding!

internal class Program
{
   private static int _numGuids;

   private static void Main()
   {
      IEnumerable<Guid> guids = Enumerable
         .Range(0, 1000)
         .Select(x => GetNewGuid());

      MeasureSpeed(guids);
      MeasureSize(guids);

      Console.WriteLine("Total guids created: {0:n0}", _numGuids);
   }

   private static void MeasureSpeed(IEnumerable<Guid> guids)
   {
      const int iterations = 10000;

      var stopwatch = new Stopwatch();
      stopwatch.Start();
      for (int i = 0; i < iterations; i++) guids.ToList();
      stopwatch.Stop();
      Console.WriteLine("ToList speed ({0:n0} calls): {1:n0} ms", iterations, stopwatch.ElapsedMilliseconds);

      stopwatch.Reset();
      stopwatch.Start();
      for (int i = 0; i < iterations; i++) guids.ToArray();
      stopwatch.Stop();
      Console.WriteLine("ToArray speed ({0:n0} calls): {1:n0} ms", iterations, stopwatch.ElapsedMilliseconds);

      Console.WriteLine();
   }

   private static void MeasureSize(IEnumerable<Guid> guids)
   {
      var startingMemory = GC.GetTotalMemory(true);

      var list = guids.ToList();
      var memoryAfterList = GC.GetTotalMemory(true);
      Console.WriteLine("ToList size: {0:n0} bytes", memoryAfterList - startingMemory);

      var array = guids.ToArray();
      var memoryAfterArray = GC.GetTotalMemory(true);
      Console.WriteLine("ToArray size: {0:n0} bytes", memoryAfterArray - memoryAfterList);

      Console.WriteLine();
   }

   private static Guid GetNewGuid()
   {
      _numGuids++;
      return Guid.NewGuid();
   }
}

Tags: , ,

Linq to NHibernate Repository

by August 31, 2009 04:50 AM

A colleague at work asked for some guidance on creating a generic repository that uses Linq to NHibernate and I thought I'd reply here instead of directly in case anyone else might find the information useful. First thing first, here is the repository I use on a project at work. I'll talk to interesting pieces of it below the code.

public interface IRepository
{
   ISession NHSsession { get; }

   /// <summary>
   /// Loads a proxy object with nothing but the primary key set.  
   /// Other properties will be pulled from the DB the first time they are accessed.
   /// Generally only use when you know you will NOT be wanting the other properties though.
   /// </summary>
   T Load<T>(object primaryKey);

   T Get<T>(object primaryKey);
   T Get<T>(Expression<Func<T, bool>> predicate);

   IQueryable<T> Find<T>();
   IQueryable<T> Find<T>(Expression<Func<T, bool>> predicate);

   T Add<T>(T entity);
   T Remove<T>(T entity);
}

public class Repository : IRepository
{
   static Repository()
   {
      _sessionFactory = Fluently.Configure()
         .Database(OracleDataClientConfiguration
            .Oracle9
            .ConnectionString(c => c.FromConnectionStringWithKey("CPSDsn"))
            .Driver("NHibernate.Driver.OracleClientDriver")
            .ShowSql()
         )
         .Mappings(mapping => mapping.FluentMappings.AddFromAssemblyOf<Repository>())
         .ExposeConfiguration(config => config.SetInterceptor(new AppInterceptor()))
         .BuildSessionFactory();
   }

   private static readonly ISessionFactory _sessionFactory;

   private static ISession _testingSession;
   public static ISession NHSession
   {
      get { return HttpContext.Current == null ? _testingSession : HttpContext.Current.Items["_nhSession"] as ISession; }
      set
      {
         if (HttpContext.Current == null)
            _testingSession = value;
         else
            HttpContext.Current.Items["_nhSession"] = value;
      }
   }

   public static void BeginUnitOfWork()
   {
      if (NHSession != null)
         throw new ApplicationException("Unit of Work already started");

      NHSession = _sessionFactory.OpenSession();
      NHSession.FlushMode = FlushMode.Commit;
      NHSession.BeginTransaction();
   }

   public static void EndUnitOfWork()
   {
      if (NHSession == null) return;

      NHSession.Transaction.Rollback();
      NHSession.Dispose();
      NHSession = null;
   }

   public static void SubmitChanges()
   {
      try
      {
         NHSession.Transaction.Commit();
         NHSession.BeginTransaction();
      }
      catch
      {
         NHSession.Transaction.Rollback();
         throw;
      }
   }

   public static void CloseSessionFactory()
   {
      _sessionFactory.Dispose();
   }

   /*******************************************************************************/
   /*******************************************************************************/

   public Repository()
   {
      _session = NHSession;
   }

   private readonly ISession _session;

   public ISession NHSsession { get { return _session; } }

   public T Load<T>(object primaryKey)
   {
      return _session.Load<T>(primaryKey);
   }

   public T Get<T>(object primaryKey)
   {
      return _session.Get<T>(primaryKey);
   }

   public T Get<T>(Expression<Func<T, bool>> predicate)
   {
      return Find<T>().SingleOrDefault(predicate);
   }

   public IQueryable<T> Find<T>()
   {
      return _session.Linq<T>();
   }

   public IQueryable<T> Find<T>(Expression<Func<T, bool>> predicate)
   {
      return Find<T>().Where(predicate);
   }

   public T Add<T>(T entity)
   {
      _session.Save(entity);
      return entity;
   }

   public T Remove<T>(T entity)
   {
      _session.Delete(entity);
      return entity;
   }
}

If I were looking at this for the first time, I think these things would jump out at me:

  • Static constructor - we're not using an IoC container to manage the NH session or session factory, so I saw this as the best way to reliably initialize the session factory. Remember that static constructors are guaranteed to be thread safe and are only called once. Pretty much exactly what we need for the NH session factory.
  • All the other static members - since we're not managing the NH session with IoC either, I needed a way to get a session. Also since this is from a web application I wanted the unit of work to be per http request. So in the Global.asax events BeginRequest and EndRequest, I call Repository.BeginUnitOfWork() and Repository.EndUnitOfWork(). And of course I didn't want any unexpected DB changes so you have to explicitly tell the repository to submit changes otherwise everything gets rolled back when ending the unit of work. The only thing remaining is the call to close the session factory. This is only used in unit tests.
  • All the NH specific stuff - yes, yes, I know. This isn't a repository you could plug into any ORM solution. I used to balk at such things that were implementation specific, but at some point I realized I was missing out on benefits of that chosen implementation. In fact, I'll talk to those benefits next.

Having talked to those things that JUMP out at you at first glance, the rest is pretty straight forward. The repository gives you what you need to get entities, get lists of entities, and of course add and remove entities from the DB.

The two methods that are specific to NH (I think) are the Load and Get methods (the Get that takes an object as primary key). Here are the reasons for their existence.

  • T Get<T>(object primaryKey) - if you use NHibernate's Get method, you'll get the added benefit of knowing NH might not have to go to the DB for that entity. If NH already has that entity loaded due to some previous call, then it will just return the one it already has. That's just freaking cool! If however, you use the other Get method that takes a predicate and uses the Linq method SingleOrDefault to get the entity, you'll hit the DB every time even if you're passing in the same predicate every time. Not cool.
  • T Load<T>(object primaryKey) - this one is very cool. If NH already has the requested object in memory, it will return to you the real deal. If not however, NH does not got to the DB to get it. Instead a proxy object is returned with nothing but the primary key set. You can use that object just as you would the real thing (pass it to constructors, use it as a parameter, etc). The intention is to use in situations where a reference to the entity is needed, but only for the sake of the relationship (FK in the DB usually), or to get to the primary key value. As a simple example, here is the body of one of the remove methods on one of my repositories: _repository.Remove(_repository.Load<MinorLine>(minorLineId));

I think that's about it. I hope you find it useful or that it at least sparks ideas for your own repository implementation.

Tags: , , ,