Repository Tip - Encapsulate Query Logic
The Repository design pattern is one of the most popular patterns in .NET development today. However, depending on its specific implementation, its benefits to the system's design may vary. One thing to watch out for is query logic leaking out of the repository implementation.
Sponsor - DevIQ
Thanks to DevIQ for sponsoring this episode! Check out their list of available courses and how-to videos.
Show Notes / Transcript
Last week I talked about Design Patterns in general, and how in most cases it makes sense to have basic familiarity with a breadth of patterns, but to go deep on the ones that are most valuable in your day-to-day development. Repository is one of a handful of patterns I've found to be useful in virtually every ASP.NET app I've been involved with over the last ten years or so. Before I knew about this pattern, I'd already learned that separation of concerns was a good idea, and that having a separate layer or set of types for data access was beneficial. The biggest benefit you get by using a Repository instead of a Data Access Layer or static DB helper class is reduced coupling, because you can follow the Dependency Inversion Principle. By the way, if you're not familiar with any of these terms or principles, there are links on the show notes page at weeklydevtips.com/018, where you'll also find a link to my recommended generic repository implementation.
This week's tip assumes you're already at least basically familiar with the repository pattern. Recently, I'm spending most of my time helping a variety of teams to write better software, and a pretty common issue I find for those app using the repository is that query logic can leak out. This can result in code and concept duplication, which violates the Don't Repeat Yourself, or DRY, principle. It can also result in runtime errors if query expressions are used that LINQ-to-Entities cannot translate into SQL.
The most common reason for this issue is repository
List implementations that return back
IQueryable results. An
IQueryable result is an expression, not a true collection type. It can be enumerated, but until it is, the actual translation from the expression into a SQL query isn't performed. This is referred to as deferred execution, and it does have some advantages. For instance, if you have a repository method that returns a list of customers, and you only want those whose last name is 'Smith', it can dramatically reduce how much data you need to pull back from the database if you can apply the
LastName == Smith filter before the database query is made.
But where are you going to add the query logic that says you only want customers named 'Smith'? That sort of thing is often done in the UI layer, perhaps in an MVC Controller action method. For something very simple, it's hard to see the harm in this. But imagine that instead of filtering for customers named 'Smith', you were instead writing a filter that would list the optimal customers to target for your next marketing campaign, using a variety of customer characteristics and perhaps some machine learning algorithms. Once you start putting your query logic in the UI, it's going to start to multiply, and you're going to have important business logic where it doesn't belong. This makes your business logic harder to isolate and test, and makes your UI layer bloated and harder to work with.
The problem with the
IQueryable return type from repositories is that it invites this kind of thing. Developers find it easy to build complex filters using LINQ and lambda expressions, but rarely take the time to see whether they're reinventing the wheel with a particular query. The fact that this approach can easily be justified because of the benefits of deferred execution and perhaps the notion that the underlying repository
List method is benefiting greatly from code reuse only exacerbates the problem. The underlying problem with returning
IQueryable is that it breaks encapsulation and leaks data access responsiblities out of the repository abstraction where it belongs.
Rather than returning
IQueryable, repositories should return
IEnumerable or even just
List types. Doing so consistently will ensure there is no confusion among developers as to whether the result of a repository is an in-memory result or an expression that can still be modified before a query is made. But then how do you allow for different kinds of queries, without performing them all in memory? There are a few different approaches that can work, and I'll cover them in future tips, but the simplest one is to add additional methods to the Repository as needed. This is often a good place to start, as it is simple and discoverable. In the example I'm using here, the
CustomerRepository class could have a new method called
ListByLastName added to it, which accepted a
lastName parameter and returned all customers with that last name. Likewise, a collection of customers fitting certain characteristics for a new marketing campaign would be returned by another appropriately-named method. Over time, this may result in repositories with a lot of different methods, but this is preferable to having query logic scattered across the UI and possibly other parts of your application (and we'll see how to fix this soon).
Would your team or application benefit from an application assessment, highlighting potential problem areas and identifying a path toward better maintainability? Contact me at ardalis.com and let's see how I can help.