All Abstractions Are Failed Abstractions

June 30, 2009

In programming, abstractions are powerful things:

Joel Spolsky has an article in which he states

All non-trivial abstractions, to some degree, are leaky.

This is overly dogmatic - for example, bignum classes are exactly the same regardless of the native integer multiplication. Ignoring that, this statement is essentially true, but rather inane and missing the point. Without abstractions, all our code would be completely interdependent and unmaintainable, and abstractions do a remarkable job of cleaning that up. It is a testament to the power of abstraction and how much we take it for granted that such a statement can be made at all, as if we always expected to be able to write large pieces of software in a maintainable manner.

But they can cause problems of their own. Let's consider a particular LINQ to SQL query, designed to retrieve the most recent 48 Stack Overflow questions.

var posts = 
  (from p in DB.Posts
  where 
  p.PostTypeId == PostTypeId.Question &&
  p.DeletionDate == null &&
  p.Score >= minscore
  orderby p.LastActivityDate descending
  select p).
  Take(maxposts);

The big hook here is that this is code the compiler actually understands. You get code completion, compiler errors if you rename a database field or mistype the syntax, and so forth. Perhaps best of all, you get an honest to goodness post object as output! So you can turn around and immediately do stuff like this:

foreach (var post in posts.ToList())
{
    Render(post.Body);
}

Pretty cool, right?

Well, that Linq to SQL query is functionally equivalent to this old-school SQL blob. More than functionally, it is literally identical, if you examine the SQL string that LINQ generates behind the scenes:

string query = 
  "select top 48 * from Posts
  where 
  PostTypeId = 1 and 
  DeletionDate is null and 
  Score >= -4
  order by LastActivityDate desc";

This text blob is of course totally opaque to the compiler. Fat-finger a syntax error in here, and you won't find out about it until runtime. Even if it does run without a runtime error, processing the output of the query is awkward. It takes row level references and a lot of tedious data conversion to get at the underlying data.

var posts = DB.ExecuteQuery(query);

foreach (var post in posts.ToList());
{
   Render(post["Body"].ToString());
}

So, LINQ to SQL is an abstraction -- we're abstracting away raw SQL and database access in favor of native language constructs and objects. I'd argue that Linq to SQL is a good abstraction. Heck, it's exactly what I asked for five years ago.

But even a good abstraction can break down in unexpected ways.

Consider this optimization, which is trivial in the old-school SQL blob code: instead of pulling down every single field in the post records, why not pull just the id number? Makes sense, if that's all I need. And it's faster -- much faster!

select top 48 * from Posts827 ms
select top 48 Id from Posts260 ms

Selecting all columns with the star (*) operator is expensive, and that's what LINQ to SQL always does by default. Yes, you can specify lazy loading, but not on a per-query basis. Normally, this is a non-issue, because selecting all columns for simple queries is not all that expensive. And you'd think pulling down 48 measly little post records would be squarely in the "not expensive" category!

So let's compare apples to apples. What if we got just the id numbers, then retrieved the full data for each row?

select top 48 Id from Posts260 ms
select * from Posts where Id = 123453 ms

Now, retrieving 48 individual records one by one is sort of silly, becase you could easily construct a single where Id in (1,2,3..,47,48) query that would grab all 48 posts in one go. But even if we did it in this naive way, the total execution time is still a very reasonable (48 * 3 ms) + 260 ms = 404 ms. That is half the time of the standard select-star SQL emitted by LINQ to SQL!

An extra 400 milliseconds doesn't sound like much, but slow pages lose users. And why in the world would you perform a slow database query on every single page of your website when you don't have to?

It's tempting to blame Linq, but is Linq really at fault here? These seem like identical database operations to me:

1. Give me all columns of data for the top 48 posts.

or

1. Give me just the ids for the top 48 posts.
2. Retrieve all columns of data for each of those 48 ids.

So why in the wide, wide world of sports would one of these seemingly identical operations be twice as slow as the other?

The problem isn't Linq to SQL. The problem is that we're attempting to spackle a nice, clean abstraction over a database that is full of highly irregular and unusual real world behaviors. Databases that:

  • may not have the right indexes
  • may misinterpret your query and generate an inefficient query plan
  • are trying to perform an operation that doesn't fit well in available memory
  • are paging data from disks which might be busy at that particular moment
  • might contain irregularly sized column datatypes

That's what's so frustrating. We can't just pretend all our data is formatted into neat, orderly data structures sitting there in memory, lined up in convenient little queues for us to reach out and casually scoop them up. As I've demonstrated, even trivial queries can have bizarre behavior and performance characteristics that are not at all clear.

To its credit, Linq to SQL is quite flexible: we can use strongly typed queries, or we can use SQL blob queries that we cast to the right object type. That flexibility is critical, because so much of our performance depends on these quirks of the database. We default to the built-in Linq language constructs, and drop down to hand-tuning ye olde SQL blobs where the performance traces tell us we need to.

Either way, it's clear that you've got to know what's happening in the database every step of the way to even begin understanding the performance of your application, much less troubleshoot it.

I think you could make a fairly solid case that Linq to SQL is, in fact, a leaky and failed abstraction. Exactly the kind of thing Joel was complaining about. But I'd also argue that virtually all good programming abstractions are failed abstractions. I don't think I've ever used one that didn't leak like a sieve. But I think that's an awfully architecture astronaut way of looking at things. Instead, let's ask ourselves a more pragmatic question:

Does this abstraction make our code at least a little easier to write? To understand? To troubleshoot? Are we better off with this abstraction than we were without it?

It's our job as modern programmers not to abandon abstractions due to these deficiencies, but to embrace the useful elements of them, to adapt the working parts and construct ever so slightly less leaky and broken abstractions over time. Like desperate citizens manning a dike in a category 5 storm, we programmers keep piling up these leaky abstractions, shoring up as best we can, desperately attempting to stay ahead of the endlessly rising waters of complexity.

As much as I may curse Linq to SQL as yet another failed abstraction, I'll continue to use it. Yes, I may end up soggy and irritable at times. But it sure as heck beats drowning.

Posted by Jeff Atwood
159 Comments

not true. "All abstractions that you notice are failed abstractions"

the fact that the database stores blobs of data on disk, not tables or columns or indexes is an abstraction. This one you dont notice

The mirage that the disk really has blocks or files or anything except long streams of ones and zeros is an abstraction too.

THe mirage that the disk has long steams of ones and zeros is an abstraction - in fact it has only areas of varying magnetism spinning on platters at high speed . Unless of course its a USB drive, or a CD, or a ram disk.

pm100 on June 30, 2009 2:02 AM

I'm no fan of LINQ, but, as other commenters already stated, your example is wrong. You need to select the ID field in your LINQ query.

I don't disagree that one needs to know everything about how their backing store to get the best performance. Given that, I see no reason at all to use LINQ.

If I have a DAL that can get me data into a DataTable in one function call, and that data can be passed up the stack (unlike the LINQ result, which can't, because it's an anonymous type), I don't see how LINQ buys me ANYTHING at all.

Someone mentioned using a background thread to refresh a cache. Wow, talk about swatting a fly with a cannon. How about just timing out the cache item and upon retrieval checking the timeout, and if it's expired, re-query the database, otherwise return the cache item. No other threads involved, no possible multi-threaded mistakes, etc. Easy-peasy. Of course, this assumes on actually designed a DAL and the DAL's API, and didn't just hack together a mess of code "to make it work."

Robert Barth on June 30, 2009 2:07 AM

I agree with Joel's statement that all non-trivial abstractions are leaky in the sense that, by their very nature, abstractions hide details that are considered less important at that particular level. An abstraction has "leaked" the details that it abstracted away.[1]

I disagree, however, with your statement that all abstractions are "failed" abstractions, mostly because you've not defined failure. You've even provided a pragmatic point-of-view where abstractions are not failures (usability, and ease of understanding).

Basically, if an abstraction hides the details that we are concerned with, then it is our job as developers to choose a different (less abstract) abstraction. If you are concerned with query timing and LINQ doesn't give you fine-grained control over that, then LINQ is the wrong abstraction to use.

Abstractions only fail when they are wrong tool for the job, but in that case it's the developer that has failed.

[1] Of course, it *is* possible to prove that any given abstraction has a one-to-one correlation with what it is abstracting, thus making it a "leakless" abstraction. For instance, one might be able to prove that the SQL grammar is fully covered by the LINQ grammar. But in Joel's argument, I'll assume he considers such an abstraction to fall into the "trivial" category.

jasonmray on June 30, 2009 2:18 AM

Very interesting post.
I am curious to know if you use the LINQ or the SQL blob method to implement the ruby on rails style "counter caches" for lets say "the number of votes on a answer" in stack overflow.
Doing it with LINQ you have to retrieve the parent answer row for the child votes rows and change the count on the retrieved answer object and then save it back while handling any concurrency issues.
On the other hand using a SQL blob just entails wrapping in a transaction, an insert into the votes table and an update to the vote count in the counter cache column of the parent answer row.
It seems like using LINQ would be much mode complicated and less efficient.

Areg Sarkissian on June 30, 2009 2:21 AM

I've said it for a while, that what they should have done instead of Linq to SQL, is add support for inline SQL, like they have support for inline XML in the newest VB.Net. This way, you could have the compiler check the sql for you, and even have dynamic objects returned with the correct properties set up, but you wouldn't have to learn some backwards SQL language, and you could fine tune your queries exactly the same way you had been doing for decades.

Kibbee on June 30, 2009 2:47 AM

Goal of every abstraction is "Reducing Complexity". The price we've to pay is "Implementation Overhead".

As long as "Overhead" are acceptable, abstractions are "Not leaky".

In another way why "Abstraction" can be "Leaky" is the "Level" in which they operate. Higher the "Level" more "Leaky". (LINQ vs TCP, one operating at very high level and another at very low level)

Murthy on June 30, 2009 3:18 AM

The essence of your argument is that Linq can sometimes be slow. The solution you pose is to increase the complexity of your code to make it quicker. To do this you propose getting into the guts of SQL Server, and the abstraction layers. However, there is a huge cost associated with doing that -- namely maintainability and complexity. The best code is the code that you don't have to write.

You are correct that slow pages loose users, however 404s and Web page exceptions loose a lot more.

The obvious solutions to the problem you state are:
1. Buy another server.
2. If that doesn't work, buy another one.
3. Repeat 1 and 2 until your problem goes away.

Servers are much cheaper than fixing even a few dozen lines of program code.

In the absence of this:
1. Profile your code to find the 1-5% that is the bottle neck.
2. Rewrite that database access as an SP.
3. Modify your Linq to use the SP.

JessicaBoxer on June 30, 2009 3:30 AM

Jeff, You need to be very careful about Top(n) queries because if you don't have the right index on the appropriate columns you have to scan the complete result set before you can sort it, and only then take your 'top' results.

Based on your post, and to second someones suggestion, why not just select the body column (rather than the Id as other's have suggested) - as that implies an extra step (getting the data that goes with that Id).

Tracing and Explain Plan are so easy in Sql Server that I would strongly recommend that you learn how to use them. As you are using Linq stick Sql Profiler running on your test DB instance and watch all the commands you are running against that DB. It's educational... especially as you will get to see any behind-the-scenes queries that Linq is doing - perhaps of the Data model.

Finally, therefore, I don't agree with you that the abstraction is leaky in the way you think (I don't know Linq well, but am pretty reasonable with SQL). You are just experiencing the joys of performance tuning! Just because it is slow to do string concatenation in C# (compared to Stringbuilder) does not make strings a bad abstraction for, ahem, strings; the implementation just makes it sub-optimal in certain circumstances :)

Nij on June 30, 2009 3:47 AM

i agree that you cannot, shouldnot, as a developer just blindly use an abstraction and just expect it to always be the most optimal method for any case. While some abstractions are simple and I don't necessarily understand 100% but can be confident of being relatively optimal for most cases (e.g string.length method etc), something as extreme as abstracting your entire data access layer REQUIRES you to understand what its doing under the covers, or just live with any in-efficiencies that occur. I just don understand developers who implement some pattern or architecture and think that just because it makes sense and is fast for some cases, that it will always handle any situation in the most optimal matter.

Rob on June 30, 2009 3:52 AM

Ummm, it's called "there's no free lunch". The trade-off for LINQ to SQL is less control, less speed, etc.

Steve on June 30, 2009 3:59 AM

I hate it when programmers use ToList() when it isn't necessary. Then inevitably, other programmers see it on blogs like this and the next thing you know, it's everywhere.

Gr8Ray on June 30, 2009 4:11 AM

I'd say the only failed abstractions are ones that either bringing nothing to table or don't hide any details. In your particular example, LINQ to SQL allows the compiler to help you identify query mistakes which increases productivity. The programmer is still free to identify pain points and bottlenecks and then address the particulars at the point of a problem. Solving existing problems over the long run is better than solving all those potential problems you don't have yet. Abstractions let us ignore details until they are important. That's their job.

MrPhil on June 30, 2009 4:26 AM

To the people who ask why they should use LINQ if it still requires understanding SQL:

The answer is that writing Linq-to-SQL code is much less work then alternative approaches (like creating/reading from SqlCommand objects).

I really don't get why this example was chosen, its pretty trivial to change the LINQ query to only grab the ID column.

fschwiet on June 30, 2009 4:40 AM

@fschwiet: Once you've written a class that can read data from a stored procedure into a DataTable in a generic way, you'll never have to do it again (I haven't had to update mine in years). One function call and I've got data from the database. And I didn't have to use an anonymous type or that horrible lambda expression nonsense.

Robert Barth on June 30, 2009 6:21 AM

Oracle Pro C/C++ does the same thing using native SQL. I wonder why Microsoft won't let programmers use SQL.

Darrin on June 30, 2009 8:07 AM

Interesting post, and something I do keep coming back to when I consider higher level web frameworks such as Ruby / ActiveRecord Vs PHP / SQL. I suppose it depends on how much complexity the abstraction saves you having to deal with (and perhaps security it provides / doesn't provide) versus the potential efficiency "leaks". Difficult call to make?

Dave on June 30, 2009 8:25 AM

Erm, should have said "higher level web programming languages" there.

Dave on June 30, 2009 8:25 AM

I'm not sure why you blame this behavior on Linq and you've proven that it is unexpected behavior and happens even if you aren't using Linq. If you haven't proven that then I've misunderstood you and you need to go back and do a little more research. Anyway, it seems that your point is that Linq chooses a default method that is sometimes less than optimal in "weird" situations. That's not saying much in my opinion.

Matt on June 30, 2009 8:26 AM

Jeff , i think i followed the solution you proposed in recent project. We started with defaulted LINQ 2 SQL operations but then restricted the set of columns we require to perform in each operation by using custom Value Objects. Hence we optimized where we needed and use defaulted where we know it does not matter (at the end very rear cases left though lolz).

Usman on June 30, 2009 8:32 AM

This was a good read! I still don't use LINQ to SQL.

Josh Stodola on June 30, 2009 8:33 AM

The problem with LINQ is that you have to learn LINQ and you still have to know SQL in all details in order to optimize and debug LINQ queries.
So if LINQ abstraction doesn't protect you from complexity of dealing with SQL -- then what's the point of dealing with LINQ?

I'm not surprised that Microsoft is on the way of abandoning LINQ.

Dennis Gorelik on June 30, 2009 8:37 AM

Matt's right; you seem to have shot your thesis to hell by having identical results after removing Linq from the equation. It looks like you want to blame Linq, realize you can't, then just throw up your hands and say "it's all just bad abstractions!"

It looks to me like the problem is in the SQL engine. I'd be interested to see the timing of the same operations in different SQL engines (SQL Server, Oracle, MySQL, PostgreSQL), controlling for syntax and feature variations between them.


By the way, the ORM I'm most familiar with is ActiveRecord, which is good about making the common things simple while allowing (or encouraging) dropping to SQL for the more complex things.

rfunk on June 30, 2009 8:42 AM

I hate to nag, but I think you misunderstood Joel's point. I think he was getting at the very fact you pointed out: you have to be careful about treating abstractions as if they're really as clean as they look. You really need to understand how it meshes with at least a layer or two up and down to avoid bad algorithms and performance-sapping holes.

Aurrin on June 30, 2009 8:50 AM

@Dennis,

Can you provide any evidence that Microsoft is abandoning LINQ? I know they are combining the LINQ-to-SQL and the EF teams, but I have not heard anything about the abandonment of LINQ.

Nathan on June 30, 2009 8:55 AM

am i the only one here grossed out by this? if i had my DBA hat on, i'd also be carrying a DBA Knife to kill you in your sleep.

i understand the programmer allure, but damn... * selects by default?

janos erdelyi on June 30, 2009 8:58 AM

Can't you just select just the ID in the linq2sql query?

var posts =
(from p in DB.Posts
where
p.PostTypeId == PostTypeId.Question &&
p.DeletionDate == null &&
p.Score >= minscore
orderby p.LastActivityDate descending
select p.Id). // -- magic here, select only the ID
Take(maxposts);

Justin Chase on June 30, 2009 9:00 AM

If you want perfect abstractions, start programming this: http://opa.yale.edu/news/article.aspx?id=6764

lc on June 30, 2009 9:00 AM

I don't think the statement: "All non-trivial abstractions, to some degree, are leaky." is true.

Some abstractions are ill-fitting and leaky, but not all. You can, for instance, use C to complete any work done in assembler. There may actually be a few OS specific usages, but encapsulated they don't leak out past the OS. With a higher abstraction like C, you can get more work accomplished (and it is more portable).

Regular expressions are another example of something that fully covers their expression space. Of course, if you try to go beyond their expressive capabilities and attempt to do something crazy like write a C compiler entirely out of REs, you will fail. But that's not leaking, it is just misuse, somewhat equivalent to using a hammer to pound in screws.

Paul.

Paul W. Homer on June 30, 2009 9:00 AM

Why not just use SQL? It's much more straightforward. I don't understand the desire to rewrite the SQL to some object notation. All it does is add unnecessary overhead to the application. Is looping through a SQL query really all that difficult?

Practicality on June 30, 2009 9:01 AM

You always have to understand all the details behind an abstraction to use it well. Java resource leaks are a really good example of this. "You don't have to allocate your own memory!" but you better understand how you are using it.

I don't think this means an abstraction has failed, it just means you can't pretend complexity doesn't exist.

asdf on June 30, 2009 9:03 AM

In my experience Linq doesn't generate * selects if you made a proper DBML, but I can't ascertain that for sure right now.

However, if you want to retrieve only the IDs, LINQ2SQL gives you a couple of pretty good options. You can either use a stored procedure method, in which case your SQL gets validated by the database, or you can just create a smaller PostStub class only with the bits that you need.

As far as abstractions go, LINQ2SQL is awesome. It's not complex enough to be hard to manage, and it's still got enough features that you can massage it into doing whatever you want in a fairly elegant way. What else could you want?

Ilia Jerebtsov on June 30, 2009 9:04 AM

Also a good framework just tries to conform to the 90/10 rule. It aught to easily solve the problem 90% of the time and ensure that the other 10% of the time it's still possible to solve the problem even if awkward or hard.

Justin Chase on June 30, 2009 9:05 AM

Jeff, first off I just wanted to say that I've been following your blog for a while now and I've enjoyed your articles. Keep up the good work.

As for the matter at hand, I won't argue the leaky abstraction part, but I do question your implementation. Why not have the top 48 questions cached in memory on the application server? Then have a background thread that updates that list every N seconds? Is it really so important that every user request get the exact recent 48 questions at that moment? Can't you live with a little latency in the name of performance? Can the user not live with a list that is 3 seconds out of date (hell it might be out of date by the time is takes to service his/her request anyways)?

I agree that, looked at closely enough, all abstractions leak. However, there may be other approaches to the problem that allow you to sidestep that leak.

rzezeski on June 30, 2009 9:07 AM

I think the native integer size of the processor would affect the bignum class, in that operations that fit within the native size would run more quickly than those that don't. Here's a test, in Python:
bignum = (2**20)
count = 10000000
while count:
x = bignum + 1
count -= 1

This takes about 3.5 seconds on my computer for all bignum values up to 2**30. If you set bignum to be 2**31, it takes closer to 5 seconds, and continues to take about that long as you increase bignum. Python's arbitrary-size integer support is a leaky abstraction, in that it's slower to do anything with large numbers than it is with small numbers.


Nicholas on June 30, 2009 9:09 AM

@janos erdelyi

Indeed.

What is actually going on behind the scenes in SQL server when you SELECT * is an initial query to the master DB system tables to get references to all the columns in your specified table which is then used to build the explicit list of columns for the query you meant.

Now, I expect this process is more than just a bit optimized - SELECT * is after all a pretty ubiquitous shortcut, but it is still a shortcut and hence has a cost, and it's one of the first things any performance-obsessed DBA (i.e all of 'em!) will tell you to stop doing.

As to why just getting the ID column is quicker, that's because for most common SQL server DB designs an ID/autonumber field serves as both the primary key and the clustered index for that table, which is going to get massively optimized as well.

I find for any complex queries across very large data sets the only acceptable solution is to *first* fetch the set of IDs/keys that match the filter criteria, stick them in a temporary table and then join that on the named list of specific columns you need to SELECT from the original table, and, more likely, any other related ones too.

And ideally this should all be done within a stored procedure so you get the benefits of pre-compilation of the SQL query plan as well as not having to make multiple network calls from web to DB server.

I have no idea whether LINQ supports the column/field mapping + editor integration with stored procedure outputs, but if it doesn't I won't even consider it.

Xiani on June 30, 2009 9:15 AM

I don't see anyone else commenting on this, but why call ToList on an IQueryable object? IQueryable implements IEnumerable, so calling ToList basically forces you to process the list of results twice:

* Once to convert the result to a List
* Once to process the items in the for loop

Since this is an article aimed at performance, wouldn't it be a quick (if somewhat trivial) win?

Adrian Anttila on June 30, 2009 9:19 AM

Most abstractions have a tradeoff. Often the cost is less flexibility and lower performance. But what you get is usually productivity, reliability, safety, etc. Over time, though, the tradeoff goes from being a bad idea in all but trivial situations up to the status quo. You seem to have left out the fact that SQL itself is an abstraction and is even used as an example in the linked Joel Spolsky article!

"The SQL language is meant to abstract away the procedural steps that are needed to query a database, instead allowing you to define merely what you want and let the database figure out the procedural steps to query it. But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify "where a=b and b=c and a=c" than if you only specify "where a=b and b=c" even though the result set is the same. You're not supposed to have to care about the procedure, only the specification. But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster."

But the same is true about most programming languages. Look at C. Originally, programmers still dipped down to assembly for the most intensive code. That started to happen less over time as computers got faster. Then people started building on C, and C was the new "lowest level". It happens all the time.

Your conclusion shows that you and I aren't actually in disagreement. Some leaks get fixed, some leaks stop mattering. Over time, some abstractions get thrown away, and others get built on with more abstractions. What I will disagree with you on, is your title which I'm sure was just intended to be a little inflammatory. Just because it leaks, does not mean it failed.

Russ on June 30, 2009 9:20 AM

It still seems to me that the cons of this particular abstraction outweigh the pros.

Practicality on June 30, 2009 9:27 AM

While I agree the database is the main suspect, I don't think it is guilty of a performance offence.

During the second example, the database might have cached the row ids from the first query knowing you were going to ask for the full rows in the next query.

It might not have the rows already in its hands, but it knew exactly where to retrieve them from the disks.

Disclaimer: I'm not a database developer, but it is what I'd do if I was one.

Bruno Laturner on June 30, 2009 9:29 AM

While I agree the database is the main suspect, I don't think it is guilty of a performance offence.

During the second example, the database might have cached the row ids from the first query knowing you were going to ask for the full rows in the next query.

It might not have the rows already in its hands, but it knew exactly where to retrieve them from the disks.

Disclaimer: I'm not a database developer, but it is what I'd do if I was one.

Bruno Laturner on June 30, 2009 9:29 AM

Absolutely agree with the core statement here: Abstractions are great until you hit their limits.

Abstractions are great for what they're there for: Making code much easier to read and write. Yet as the demands on the machine you need increasingly precise movements of your 1s and 0s (or a whole lot more/better machinery to move them with.) If your abstraction doesn't move your 1s and 0s as precisely as you need them (as is the case here) you need to drop down a level and get closer to them.

Eventually we will hit one boundary that we won't be able to abstract away: Electrons. :)

Drew on June 30, 2009 9:30 AM

"Selecting all columns with the star (*) operator is expensive, and that's what LINQ to SQL always does by default."

Um, not so much Jeff. LINQ to SQL selects exactly what you ask it to select by default. If you change the second-to-last line of your initial query from "select p)." to "select p.Id).", LINQ to SQL will just select the Ids.

Saying that LINQ to SQL selects all columns with the * operator by default is no different than saying SQL itself selects all columns with the * operator by default. Which is to say, it's absurd. If you write "select p" you'll get all columns using the * operator, just as if you write "select *" in native SQL. In neither case is it the default, it's merely what you asked for.

Brent on June 30, 2009 9:31 AM

The real problem is finding where the abstraction leaks. If you use a profiler along with it you only have to change your LINQ querys with SQL in the parts of your code where it really slows down everything.

Is LINQ really wroth it? Seriously, it is whole another language to learn and to use, in the end you still have to know SQL and know it well in order to optimize the slow parts of your code.

Hoffmann on June 30, 2009 9:34 AM

Justin Chase is correct. All you need to do is project the data. I am not sure is saying

select category.ID

will do the projection, but i know that saying

select new { category.ID }

will project the data and give you what you want. Abstractions leak the most when you don't spend the time to understand how the system works.

Darren Kopp on June 30, 2009 9:37 AM

@Hoffmann: "Is LINQ really wroth it? Seriously, it is whole another language to learn and to use"

It's important to note we're talking about LINQ-to-SQL here, not just any LINQ to variant.

I don't think it's accurate to describe LINQ-to-SQL as a "whole another language" - it's a bit of syntax to the .NET languages. It's not like, say, sitting down and learning Ruby. If you already know SQL, even a little bit, learning LINQ-to-SQL isn't a big deal.

Marcel Lamothe on June 30, 2009 9:38 AM

I see you do

foreach (var post in posts.ToList())

in your code. Why do you call .ToList() before running over the results?

configurator on June 30, 2009 9:38 AM

@Dennis:

Fail. Microsoft is not abandoning LINQ.

What's the point of using LINQ? When you learn to write code, you'll appreciate the difference between compile-time and run-time errors.

Raj on June 30, 2009 9:39 AM

Seems appropriate to share this link on the topic of abstractions: http://cuiltheory.wikidot.com/

Francesco Rizzi on June 30, 2009 9:39 AM

@Bruno you're absolutely right, there is caching, at very least in the sense of "Oh i know where that is!".

I wrote a DB abstraction library based on LINQ, but written for PHP, and the performance compared to straight sql is HORRENDOUS (10ms using the lib, vs 0.2ms with mysql_query($query)). I'm working on improving it and PHP 5.3.0 will go a long way to help with that. But even with that performance hit, I so greatly prefer manipulating objects and calling functions than dealing with arrays and piecing together sql query strings.

Ryan Tenney on June 30, 2009 9:45 AM

@janos erdelyi

Yeah I find it totally creepy too. Database already has an abstraction layer - stored procedures, NOT written by procedural programmers who don't know crap about how to treat a database. I'm working with an application right now which does nothing but client-side dynamic SQL, and the performance is horrible, it is prone to errors, it is capable of screwing up data integrity, and overall it is extremely buggy. Why? Because the people who wrote all the SQL don't know crap about SQL. LINQ can not be good for this situation.

Jasmine on June 30, 2009 9:49 AM

If you think LinqToSql is bad then what do you think about developers that make up for their poor SQL skills by using SQL CLR functions?

...not that I'd know any developers that bad or anything... uh.... like clearly not me.... :)

webdev_hb on June 30, 2009 9:54 AM

@Dennis

Microsoft is not abandoning LINQ. Nor are they completely abandoning LINQ to SQL, at least not yet. They have stated that they will focus the majority of their efforts on the Entity Framework instead, which (in theory) will have the same capabilities and usage pattern as LINQ to SQL, plus a lot more.

As "the point of dealing with LINQ", it is that you can write queries that the compiler understands, instead of writing an opaque blob. Yes, you will probably still have to understand SQL in order to get the most out of it, but you don't have to mix your C# code with your SQL code, which leads to a more maintainable system.

David Nelson on June 30, 2009 9:57 AM

I think this is a poor argument as presented, but I think there is still a point to be made. The problems you have seem (to me) to be with declarative languages in general. The problems you have in your example exist in SQL alone so it's wrong to point the finger at LINQ (as others have stated).

To the point you could make, though, all of what you say is true regarding SQL. Since there is a "shroud" (call it abstraction if you will) of sorts in every declarative language (no way to say exactly how to do things - just what you want), you will always have the behavior mentioned above.

The reason you are more forgiving to SQL is you have learned enough about the query engine to have *some* say in *how* the query will indeed run.

If I had to guess, your beef in general, is with declarative languages.

My $.02

Russ on June 30, 2009 10:01 AM

> we programmers keep piling up these leaky abstractions, shoring up as best we can, desperately attempting to stay ahead of the endlessly rising waters of complexity

I find the use of the word "piling" dangerous here. As in, too many coders are afraid to go back and readjust the "lower level" abstraction in the name of "frozen API", instead opting for the little cozy hack that one day will [this part removed due to excessive swearing].

It is important to always strive to make abstractions better: ofcourse not changing everything every three minutes, but keeping things up-to-date

Simone Deponti on June 30, 2009 10:19 AM

You've seen Django ORM, right? If not, check it out: http://docs.djangoproject.com/en/dev/#the-model-layer

Beats LINQ big time SQL-wise.

Anonymous on June 30, 2009 10:22 AM

I love LINQ. I have had SQL queries that were very complex and would take quite a bit to wrap my brain around come through much more optimized with LINQ. The issues here isn't LINQ = minscore
orderby p.LastActivityDate descending
select p.Id).
Take(maxposts);

loop over those postIds, or use some method I can't think of right now to turn it into a list of the correct variety (again I'm not typign this in Visual Studio)

Then:

var posts = from p in DB.Posts where postIds.Contain(p.Id) select p;

Kearns on June 30, 2009 10:47 AM

There is no need to put ToList() in your code samples, just foreach over the IQueryable object.

If you just want the ID's tell LINQ to SQL and it will do that for you too.

e.g.
var posts =
(from p in DB.Posts
where
p.PostTypeId == PostTypeId.Question &&
p.DeletionDate == null &&
p.Score >= minscore
orderby p.LastActivityDate descending
select p.ID).
Take(maxposts);

As to the performance figures of issuing multiple queries instead of one in my experience this has performed poorly. I wonder what exactly is going on with your database...

[)amien

Damien Guard on June 30, 2009 10:51 AM

@Jeff

I agree that all abstractions are leaky. However, I disagree that "leak" == "fail". The point of abstractions is NOT to make the underlying mechanism completely irrelevant; the point is to allow us to focus on the task instead of the mechanism as much as possible, and to accomplish the task more easily and more safely than is possible with the underlying mechanism.

LINQ to SQL allows me to focus on the task I am working on instead of the mechanism by which I am getting my data out of the database and into objects. It makes it easy and relatively safe to work with data, which is the whole point. Therefore it has succeeded in its goal. The fact that it is not a perfect abstraction is irrelevant, especially given that there is no such thing, as Joel pointed out in the first quote.

David Nelson on June 30, 2009 10:51 AM

Even if an abstraction is horribly leaky, it's still useful for communication purposes. If you have 10 people on your team that speak English and one that speaks English and Hindi, then that one person will translate for the others. If you're translating a foreign concept, some foreign words might leak out, but having 95% of that translation in a form you understand makes that leakiness easier to handle.

[Aside: No one makes the argument that everyone must speak English and Hindi. But if you make the same argument that 10 developers speak C# and one speaks C# and SQL, the masses will proclaim that everyone should speak C# and SQL.]

It's the same argument when you talk about customer documentation: The document is an abstraction for the application and the code behind it, it's completely inadequate and inaccurate, but easier to deal with from the customer point of view. Oftentimes developer speak leaks out - dialog, server, xml, but it's usually tolerable, because an attempt has been made to abstract away the gritty details (i.e. Be thankful I didn't drop the source code in your lap and run).

SteveJ on June 30, 2009 10:55 AM

So Jeff with your Wide, Wide World Of Sports comment maybe you should retitle this article to Blazing SQLs

Doug on June 30, 2009 11:00 AM

It seems unfair to me that you criticize abstractions because of their lack of performance optimizations. That's like attacking an automatic transmission for not having the performance of a stick. It's a miracle the automatic transmission even approaches the performance of a stick.

Abstractions are all about functionality. You get the functionality without having to delve into the details. They are not about performance other than not being too awful.

This is just the nature of things. You can go play some pickup basketball with friends and not have to think into the details of what you are doing. But if you want to get the best possible performance as a player you have to get into all the little details of how you do every move and do specific exercises to tune your body for those moves.

Tolerable performance issues do not make an abstraction leaky or a failure. Suboptimal performance is not an issue that abstractions are intended to address.

Matthew Kane on June 30, 2009 11:02 AM

Abstractions are definitely the most important bit of programming, and are not as the title of this post implies, failed in any general way what so ever. The only way we can accomplish anything within a reasonable amount of time is by leveraging and learning various levels of abstraction. Else, you should resort to programming machine code. Oh wait, machine code is an abstraction aswell. How about carefully manipulating electrical signals instead?

tedeh on June 30, 2009 11:10 AM

re: MS and L2Sql.

While Microsoft hasn't come right out and said Linq to SQL is dead, they did the next best (worst?) thing by saying that they were going to focus all their efforts on EF. After making that statement, we quickly abandoned any thought of using L2Sql on our development projects.

Andrew on June 30, 2009 11:21 AM

You are all missing the point.

LINQ doesn't make things slower. Doing silly things in database queries makes things slower. The point is that LINQ is a failed abstraction because you STILL have to know what things you aren't supposed to do in a relational database if you want performance; so what is it abstracting?

Nicolas on June 30, 2009 11:22 AM

@Jon Galloway,

Actually, within the Rails world, there is a plugin that monitors how ActiveRecord ORM is used by the application and the modifies the ActiveRecord calls (and thereby the SQL it outputs) to better suit that.

For example, if you start with Select * from Posts, but you only use ID, Body, Title, the plugin will modify the AR statement to only select those fields from Posts.

As for the article, n+1 queries will never be faster than n query selecting the records you want. It's just silly saying otherwise.

Srdjan on June 30, 2009 11:29 AM

Did you clear input buffers and procedure caches before running each of these? The results don't make any sense, unless there's a pathologically bad query plan for statement 1. Since that's doubtful - the query is pretty simple - I'd guess you didn't account for cached reads.

Common sense should tell you that
SELECT * FROM Table WHERE Condition
is faster than
SELECT * FROM Table WHERE Id IN (SELECT Id FROM Table WHERE Condition)

So much so, in fact, that SQL may even optimize the 2nd case into the 1st case.

Do me a favor, and run SET STATISTICS IO ON before running your queries. Then take a look at the number of IO's required for each scenario.

Mark Brackett on June 30, 2009 11:37 AM

Why compare the performance of SELECT * against SELECT Id? Of course SELECT Id is faster, it only has one column to deal with that's probably the clustered index. Why not compare SELECT * against SELECT Id, Title, etc. If you need to populate a collection of Post objects (and not just get the Ids) that comparison would expose whether or not L2S is just being lazy with SELECT * or if it's really no different performance-wise than selecting all the columns.

John Sheehan on June 30, 2009 11:38 AM

Why compare the performance of SELECT * against SELECT Id? Of course SELECT Id is faster, it only has one column to deal with that's probably the clustered index. Why not compare SELECT * against SELECT Id, Title, etc. If you need to populate a collection of Post objects (and not just get the Ids) that comparison would expose whether or not L2S is just being lazy with SELECT * or if it's really no different performance-wise than selecting all the columns.

John Sheehan on June 30, 2009 11:38 AM

I think the use of the word 'failed' is an exaggeration. I know what you're meaning to say and I agree though.

I've been having a lot of fun* engineering some abstractions for typography: fonts and text rendering, oh my. I need 1 abstraction that can cover both GDI and WPF's typography systems. It's not fun, and it is leaky or incomplete, but it will work and be beneficial. That is, until the next version when I decide I need to pivot the topology of the whole API just to enable some important rendering feature or increase reliability/performance.

* not

Rick Brewster on June 30, 2009 11:53 AM

From reading the comments it seems that a lot of people don't realize what it means for an abstraction to be "leaky".

An abstraction is a simplification or reduction in the rules of an underlying system.

For an abstraction to be leaky simply means that there will be some cases where actions that, according to the abstraction, are equivalent (i.e. they should be exactly the same thing) but will behave differently. Because the abstraction defines them as equivalent but they aren't when actually used, they demonstrate that the abstraction is not "reality", but only a simplification of it.

If an abstraction were not leaky - if all actions that are equivalent according to the abstraction behaved identically - then it would be indistinguishable from the underlying system and thus considered to be equivalent to it (which by definition, is not abstract).


So all abstractions are leaky by definition, because if they aren't leaky then they include all rules of the underlying system, which makes them the same as the underlying system.

Another affect of this is that the more you reduce the "leakiness" of an abstraction, the closer and closer it reflects the underlying system, and thus the more complicated and less useful it becomes as an abstraction - since the purpose of an abstraction is to simplify the underlying system.

Personally, I feel that the ideal solution is to understand the underlying system, and to define multiple abstractions that each handle a subset of the underlying system, but that together have 99% coverage. This results in smaller abstractions that can remain relatively simple, and the remaining 1% can be handled by directly using the underlying system (which will be complicated and a pain in the butt, but still possible).

JohnC on June 30, 2009 12:06 PM

Jeff,

one more year of shallowness on your blog and you will be bought out by Comedy Central.

Have you actually read and understood what you wrote? It is complete and utter nonsense.

First, you are comparing apples and oranges. Second, you do not provide a convincing scenario for using LINQ.

In your previous post, you are praising Apple and iPhone 3G - apparently not knowing anything about simple and elegant abstractions *across* APIs on Apple platform. Isn't this some sort of hint, if not a full proof, that simple abstractions work well?

securityhorror on June 30, 2009 12:06 PM

You're not comparing apples to apples. You asked LINQ for the entire object, so it selected everything. Ask it for the ID, and it selects the ID. The SQL generated in this case is identical to the hand generated SQL.

Tim on June 30, 2009 12:24 PM

Jeff,
I suggest rewriting the article with examples comparing apples to apples. It makes no sense comparing the performance of a LINQ to SQL star (*) query against a single-column raw query (which, obviously, takes the lead).

It's so easy to do compiled-, single-column- and narrow queries with LINQ to SQL (and LINQ to EF, for that matter). Did you even bother to look into compiled queried? Ever heard of the First, FirstOrDefault or Single extension methods?

I generally enjoy reading your blog entries, but this one was a waste of time, and just plain wrong.

Anders Borum on June 30, 2009 12:27 PM

I think some of the comments are wrong.

The goal of LINQ to SQL (and other ORMs) is not to keep the developer from having to learn SQL; learning SQL is just as important as learning your primary programming language (e.g. C#). Obviously, you don't have to, but it makes sense.

LINQ to SQL etc. provides a flexible CRUD mechanism that relieves the developer from maintaining a huge chunk of plumbing; who cares about plumbing such as CRUD tiers? Clients certainly don't care about it; they care about business value and there's (typically) little business value in a CRUD tier.

Anders Borum on June 30, 2009 12:34 PM

Why not make an attribute, affecting strings, that would make the compiler know the string contains an SQL query and it would check the completeness of the query, causing compile-time errors when an SQL clause goes awry?

Something like:

[SqlString(Connection="SqlConnectionString1")]
string query = "select * from foo.bar";

I'm sure we could complicate SqlStringAttribute further, providing schemas, SQL dialects, limitations and whatnot.

Gregor Petrin on June 30, 2009 12:37 PM

Jeff, you should have asked on Stackoverflow how to optimize your LINQ before you wrote this blogpost.

Nosredna on June 30, 2009 12:38 PM

Higher-level abstractions don't have to gloss over the lower-level details. You can have the 'clean' abstractions, and you can _also_ have high-level abstractions for specifying the required lower-level details for performance.

That is, you don't have to deal with the lower-level details in a lower-level way.

James on June 30, 2009 12:44 PM

To expand on the 'select new{p.Id}' comments, you can also avoid multiple db trips by selecting everything you need (not just the id):

(from p in db.Posts
where ...
select new PostListItem{Id = p.Id, Title=p.Title, ...})
.Take(48)

I like to strong-type the results, and draw a subtle distinction between the canonical domain class (ie Post) and a related type (PostListItem). One might even simplify with extension methods:

(from p in db.Posts
where ...
select p)
.Take(48)
.AsPostListItems()

Daniel on June 30, 2009 12:53 PM

Good post.

I won't go on about your example, that's been covered. I won't go on about the fact that everything is an abstraction, that's been covered too.

I would like to bring you up on "Like desperate citizens manning a dike in a category 5 storm ...".

May I suggest it is more like "Desperately juggling bowling balls, chainsaws and car batteries while trying to keep a dozen plates spinning on their poles during a force 10 gale of pompous criticism." :)

Paul

Paul on June 30, 2009 1:05 PM

I've written my own C# Stored Procedure classes using generics, which for me, work very well. I simply create an instance of a stored procedure, add parameters, and call the execute method which returns an array of objects of the desired type.

e.g.

StoredProcedure sp = new StoredProcedure("get_users");
sp.AddParameter("@age", 21);
User[] users = sp.ExecuteWithReturnObjects();

I've spent no more than 10 minutes learning about Linq since it was launched, it looks far too messy for my liking. I'm sure it has it's advantages, but from what I've seen, I've no desire to learn it as I'm happy with my own classes.

Bry on July 1, 2009 2:06 AM

The linq example is not literally identical to sql. An ORM has quite a bit more overhead than raw sql. Maybe the linq example happens to generate the same sql but aside from that they do pretty different things. Maybe I'm just being pedantic, but you did take the effort to bold and italicize the word.

Nicholas Wright on July 1, 2009 2:24 AM

I think you missed the point of Joel's original post. The valuable information I got from Joel's post wasn't that abstractions are doomed, worthless things because they can't perfectly hide, but instead that *you should be aware of what can go wrong*. If you use an abstraction, you need to know what's going on underneath to some extent, so you can deal with the inevitable leak.

What I find most amusing with this post is that you seem to have failed to grasp this from Joels post and so wrote your own version of it using an updated abstraction :)

workmad3 on July 1, 2009 2:30 AM

The whole point about abstractions is that they leave stuff out. This isn't "leaky" (which sounds like a fault) but the raison d'etre. In this case, the abstraction tells you what the result of your query should be, not how long it will take. If you're interested in speed (and good if you are), you simply can't expect SQL/LINQ to tell you.

William on July 1, 2009 2:36 AM

I’m surprised that only very few people are interested in the actual SQL issue itself. It seems to me that there must be something wrong with the way Jeff measured the queries. My experience is that SQL Server (as any other modern database) is much smarter in most cases that we are, and that if you come up (according to you) with an optimal query plan, SQL Server usually does not perform worse, and usually it does something better. There are definitely edge cases, but simple queries like the ones Jeff is talking about, should be straightforward.

First, I hope that Jeff is intentionally omitting the ORDER BY clause from the simplified snippets:

select top 48 Id from Posts

select top 48 * from Posts

Without the ORDER BY, it’s completely different story. But since it’s so trivial, I assume that Jeff just omitted it for clarity.

What I don’t get mainly is the following. According to the Jeff's measurements, doing something like:

declare @ids table (id int PRIMARY KEY);
insert into @ids (id) select top 48 * from Posts order by SomeField;
select * from Posts in @ids;

should take the 404 ms. But, as I said before, it’s likely that SQL Server uses something along these lines (or pehrpas something more efficient) when executing this:

select top 48 * from Posts order by SomeField;

So there must be something fishy in the measurements:

1. How was the timing done? In SQL? In a client app?

2. What the time actually is? CPU? Entire query?

3. Jeff, did you make sure you used DBCC DROPCLEANBUFFERS before every test?

I would be also good to see query plans. I’m sure the explanation of this issue must be something simple.

Jan Zich on July 1, 2009 2:50 AM

@Przemek and others: Chill. Jeff didn't say that LINQ should not be used. He didn't denounce it, rather the opposite.

So the lessons I see here is this:

1) LINQ2SQL is an abstraction in that it wants you to consider databases as object collections. It is leaky, because there exists cases where you need to know that this is not really true. That's the whole meaning of "leaky abstraction".

Note that it doesn't matter that you can write Linq-statements that gets better results, because in order to do so you need to bypass the abstraction that Linq provides. It's still leaky.

Also note that "leaky" is not an insult! SQL itself is also a leaky abstraction: sometimes you need to be aware that your data is not really all in RAM but resides on a disk, and that disks are spinning metal platters with reading arms that need to move.


2) The point of this is that you *should* use abstractions, since they are easier to use and often good enough, but you should also be aware of their limitations when you reach the edge cases.


Oh, and of course:
3) SQL is weird, it's counter-intuitive that top 48(*) would be slower than first getting just the IDs then looping over the result to get the rest.

Good post.

Console on July 1, 2009 3:03 AM

@Console thank you for your explanation, I might have missed the point initially

Przemek on July 1, 2009 3:45 AM

I heart stored procedures.

Pre-compilation has to be a bonus, business rules at the DB level ensures integrity across multiple apps, plus you can offload to your DBAs for performance tweaks.

I may be missing something though with regards to the benefits of OO access to a relational DB, maybe for smaller apps.

c on July 1, 2009 3:48 AM

@JessicaBoxer:

"You are correct that slow pages loose users, however 404s and Web page exceptions loose a lot more.

The obvious solutions to the problem you state are:
1. Buy another server.
2. If that doesn't work, buy another one.
3. Repeat 1 and 2 until your problem goes away.

Servers are much cheaper than fixing even a few dozen lines of program code."

I understand what you mean and don't want to sound like an environmental extremist, but I really hate comments like this. Surely I can understand that one cannot spend hundreds of hours of tuning if a single extra server will do the same job. However, I would expect that people put effort and competence into tuning, and that buying a server is not the default solution. Learn to utilize the existing power of your hardware. If you screw up your SQL, fix it, don't buy a new server to cover up the performance problem, which in essence was a competence problem.

It is this whole mentality of "unlimited resources" and "commodity economics" that gets us humans into trouble. You clearly see it in operating systems too. My current hardware is hundreds of times more performant than 10 years ago, still as an end user I have to wait just as long. Capacity is wasted at an obscene level.

I will not bother to go into the environmental cost of it all, as I doubt anyone will care.

Ferdy on July 1, 2009 3:49 AM


Bit of a straw-man argument. Joel's warning that useful abstractions tend to be leaky was surely not intended to say "don't use abstractions".

Norval Artus on July 1, 2009 3:55 AM

Very apropos -- this sort of thing just came up on Stack Overflow. "Which key is more optimized?"

http://stackoverflow.com/questions/1058765/mysql-which-key-is-more-optimized/1058915#1058915

Ignoring the non-answer of "Test it and see"; in order to really start somewhere you have to KNOW how SQL, and in this case MySQL, indexes are constructed so you can generalize to an optimal answer.

This is the sort of thing that won't be easily abstracted away any time soon. Even Microsoft SQL Server's Index Tuning Wizard (now Database Engine Tuning Adviser) that has been in development for MANY YEARS often gets it godawfually wrong.

For the forseeable future developers will need knowledge of SQL DBMS fundamentals.

--
Matt

MattRogish on July 1, 2009 4:49 AM

@Ferdy: I couldn't agree more.

Marcel Lamothe on July 1, 2009 6:03 AM

@c (and others): Linq to SQL can be used with stored procedures.

http://weblogs.asp.net/scottgu/archive/2007/08/16/linq-to-sql-part-6-retrieving-data-using-stored-procedures.aspx

Marcel Lamothe on July 1, 2009 6:06 AM

Burrrrrn! The commentators really ate you up with this one and this time i actually have to agree with them.

Jazz on July 1, 2009 6:26 AM

Call me a stick-in-the-mud if you want, but I confess I'm yet to be persuaded that LINQ and the entity framework are anything more than a stored-proc replacement written by those who don't fully understand databases. I've tried both ways on plenty of big projects and no way do I want to embed any more direct table access SQL in my app code than I absolutely have to - it just gets fragile and harder to keep track of far too quickly. I can live with and deal with the lack of typing far more easily, and if the SP is properly set up it's not going to crash at runtime due to a malformed query anyway is it?

It's got better since, but I remember getting very annoyed with ASP.Net 1.x, feeling like it was written by a VB team to address the problems of a beginner web dev - as an experienced web dev, it felt like it utterly missed the point and made some very silly base assumptions seem routine.

Would be nice to see a new .Net toy that felt like it was designed by the domain experts and not the interns.

Greg Webb on July 1, 2009 6:27 AM

Well, duh. The devil always has been, and always will be, in the details.

There's no such thing as general code, or general actions. It's just a large morass of specific code and specific actions.

The minute you generalize or abstract, you're hiding relevant, and important details. They're both the same thing, really.

Cynoclast on July 1, 2009 6:43 AM

There's another problem with this article that no one else has noticed (based upon my skimming of the comments--I apologize in advance if I'm wrong).

There's a very good reason (and it has nothing to do with LINQ) that:

SELECT TOP 48 * FROM X ORDER BY ID

would be slower than
SELECT * FROM X WHERE ID = n

where n goes from 1 to 48.

The first query does more.

The second looped query makes an assumption (perhaps a valid one...I don't know) about the data that the first one can not. The assumption is that there are no holes.

what if there's no record with ID = 24? Then to get the top 48, you'll have to look at ID's 1 to 49. The first query does that. The second loop does not.

I have no problem with using a slower query (LINQ or otherwise) that's actually going to return all of my data, over using a faster one that isn't.

Chris J. Breisch on July 1, 2009 6:47 AM

I think many of the criticisms of this entry are missing the point. That said, I'd like to add my own point:

The problem with LINQ is the same as that of any attempt at a "general" or "generic" interface to multiple database systems. You'd have the same sorts of issues using ODBC, Pythons ABI, JDBC, etc. Jeff is right that LINQ is a "leaky abstraction", but it's just the latest in a long series of them.

The real problem is that every database is different when it comes to performance optimization. The "select id" may be more efficient for many databases (and it should be), but you can't in general count on any query being optimal for all target databases. One example is that SQL Server and Informix like you to write several queries and link them with temporary tables, but Oracle much prefers you to use sub-queries aggressively and let the optimizer handle it.

So, if you want ease of programming, use LINQ, or ABI, or JDBC. If you want performance, use native SQL. Understand the optimizer. Try different queries. Resort to optimizer hints if necessary. And, of course, design the physical and logical database structures properly.

A. L. Flanagan on July 1, 2009 6:55 AM

I believe LINQ to SQL is an unnecessary abstraction. It just adds another layer to your database and doesn't let you control how many calls you send to the database. Just remove it already and stick your SQL in stored procedures. You'll probably get 3x the speed just doing that.

Even Microsoft has relized LINQ to SQL is a lost cause.
http://oakleafblog.blogspot.com/2008/05/is-adonet-team-abandoning-linq-to-sql.html

I think LINQ is better suited for objects and xml. LINQ to Objects and LINQ to XML are both very good implementations.

Donny V on July 1, 2009 7:14 AM

"We can't just pretend all our data is formatted into neat, orderly data structures sitting there in memory."

I don't really understand what this is supposed to mean. Of course your data is formatted in neat orderly data structures. In your case that happens to be inside the database, but either way the most efficient way to access it is to address the actual structure of that data.

fool on July 1, 2009 7:16 AM

Why aren't you using a stored procedure ?

Anonymous on July 1, 2009 7:43 AM

More comments»

The comments to this entry are closed.