In programming, abstractions are powerful things:
Joel Spolsky has an article in which he states
All non-trivial abstractions, to some degree, are leaky.This is overly dogmatic - for example, bignum classes are exactly the same regardless of the native integer multiplication. Ignoring that, this statement is essentially true, but rather inane and missing the point. Without abstractions, all our code would be completely interdependent and unmaintainable, and abstractions do a remarkable job of cleaning that up. It is a testament to the power of abstraction and how much we take it for granted that such a statement can be made at all, as if we always expected to be able to write large pieces of software in a maintainable manner.
But they can cause problems of their own. Let's consider a particular LINQ to SQL query, designed to retrieve the most recent 48 Stack Overflow questions.
var posts = (from p in DB.Posts where p.PostTypeId == PostTypeId.Question && p.DeletionDate == null && p.Score >= minscore orderby p.LastActivityDate descending select p). Take(maxposts);
The big hook here is that this is code the compiler actually understands. You get code completion, compiler errors if you rename a database field or mistype the syntax, and so forth. Perhaps best of all, you get an honest to goodness post object as output! So you can turn around and immediately do stuff like this:
foreach (var post in posts.ToList())
{
Render(post.Body);
}
Pretty cool, right?
Well, that Linq to SQL query is functionally equivalent to this old-school SQL blob. More than functionally, it is literally identical, if you examine the SQL string that LINQ generates behind the scenes:
string query = "select top 48 * from Posts where PostTypeId = 1 and DeletionDate is null and Score >= -4 order by LastActivityDate desc";
This text blob is of course totally opaque to the compiler. Fat-finger a syntax error in here, and you won't find out about it until runtime. Even if it does run without a runtime error, processing the output of the query is awkward. It takes row level references and a lot of tedious data conversion to get at the underlying data.
var posts = DB.ExecuteQuery(query);
foreach (var post in posts.ToList());
{
Render(post["Body"].ToString());
}
So, LINQ to SQL is an abstraction -- we're abstracting away raw SQL and database access in favor of native language constructs and objects. I'd argue that Linq to SQL is a good abstraction. Heck, it's exactly what I asked for five years ago.
But even a good abstraction can break down in unexpected ways.
Consider this optimization, which is trivial in the old-school SQL blob code: instead of pulling down every single field in the post records, why not pull just the id number? Makes sense, if that's all I need. And it's faster -- much faster!
| select top 48 * from Posts | 827 ms |
| select top 48 Id from Posts | 260 ms |
Selecting all columns with the star (*) operator is expensive, and that's what LINQ to SQL always does by default. Yes, you can specify lazy loading, but not on a per-query basis. Normally, this is a non-issue, because selecting all columns for simple queries is not all that expensive. And you'd think pulling down 48 measly little post records would be squarely in the "not expensive" category!
So let's compare apples to apples. What if we got just the id numbers, then retrieved the full data for each row?
| select top 48 Id from Posts | 260 ms |
| select * from Posts where Id = 12345 | 3 ms |
Now, retrieving 48 individual records one by one is sort of silly, becase you could easily construct a single where Id in (1,2,3..,47,48) query that would grab all 48 posts in one go. But even if we did it in this naive way, the total execution time is still a very reasonable (48 * 3 ms) + 260 ms = 404 ms. That is half the time of the standard select-star SQL emitted by LINQ to SQL!
An extra 400 milliseconds doesn't sound like much, but slow pages lose users. And why in the world would you perform a slow database query on every single page of your website when you don't have to?
It's tempting to blame Linq, but is Linq really at fault here? These seem like identical database operations to me:
1. Give me all columns of data for the top 48 posts.
or
1. Give me just the ids for the top 48 posts.
2. Retrieve all columns of data for each of those 48 ids.
So why in the wide, wide world of sports would one of these seemingly identical operations be twice as slow as the other?
The problem isn't Linq to SQL. The problem is that we're attempting to spackle a nice, clean abstraction over a database that is full of highly irregular and unusual real world behaviors. Databases that:
That's what's so frustrating. We can't just pretend all our data is formatted into neat, orderly data structures sitting there in memory, lined up in convenient little queues for us to reach out and casually scoop them up. As I've demonstrated, even trivial queries can have bizarre behavior and performance characteristics that are not at all clear.
To its credit, Linq to SQL is quite flexible: we can use strongly typed queries, or we can use SQL blob queries that we cast to the right object type. That flexibility is critical, because so much of our performance depends on these quirks of the database. We default to the built-in Linq language constructs, and drop down to hand-tuning ye olde SQL blobs where the performance traces tell us we need to.
Either way, it's clear that you've got to know what's happening in the database every step of the way to even begin understanding the performance of your application, much less troubleshoot it.
I think you could make a fairly solid case that Linq to SQL is, in fact, a leaky and failed abstraction. Exactly the kind of thing Joel was complaining about. But I'd also argue that virtually all good programming abstractions are failed abstractions. I don't think I've ever used one that didn't leak like a sieve. But I think that's an awfully architecture astronaut way of looking at things. Instead, let's ask ourselves a more pragmatic question:
Does this abstraction make our code at least a little easier to write? To understand? To troubleshoot? Are we better off with this abstraction than we were without it?
It's our job as modern programmers not to abandon abstractions due to these deficiencies, but to embrace the useful elements of them, to adapt the working parts and construct ever so slightly less leaky and broken abstractions over time. Like desperate citizens manning a dike in a category 5 storm, we programmers keep piling up these leaky abstractions, shoring up as best we can, desperately attempting to stay ahead of the endlessly rising waters of complexity.
As much as I may curse Linq to SQL as yet another failed abstraction, I'll continue to use it. Yes, I may end up soggy and irritable at times. But it sure as heck beats drowning.
| [advertisement] Interested in agile? See how a world-leading software vendor is practicing agile. |
Oracle Pro C/C++ does the same thing using native SQL. I wonder why Microsoft won't let programmers use SQL.
Darrin on June 30, 2009 7:07 AMInteresting post, and something I do keep coming back to when I consider higher level web frameworks such as Ruby / ActiveRecord Vs PHP / SQL. I suppose it depends on how much complexity the abstraction saves you having to deal with (and perhaps security it provides / doesn't provide) versus the potential efficiency "leaks". Difficult call to make?
Dave on June 30, 2009 7:25 AMErm, should have said "higher level web programming languages" there.
Dave on June 30, 2009 7:25 AMI'm not sure why you blame this behavior on Linq and you've proven that it is unexpected behavior and happens even if you aren't using Linq. If you haven't proven that then I've misunderstood you and you need to go back and do a little more research. Anyway, it seems that your point is that Linq chooses a default method that is sometimes less than optimal in "weird" situations. That's not saying much in my opinion.
Matt on June 30, 2009 7:26 AMJeff , i think i followed the solution you proposed in recent project. We started with defaulted LINQ 2 SQL operations but then restricted the set of columns we require to perform in each operation by using custom Value Objects. Hence we optimized where we needed and use defaulted where we know it does not matter (at the end very rear cases left though lolz).
Usman on June 30, 2009 7:32 AMThis was a good read! I still don't use LINQ to SQL.
Josh Stodola on June 30, 2009 7:33 AMThe problem with LINQ is that you have to learn LINQ and you still have to know SQL in all details in order to optimize and debug LINQ queries.
So if LINQ abstraction doesn't protect you from complexity of dealing with SQL -- then what's the point of dealing with LINQ?
I'm not surprised that Microsoft is on the way of abandoning LINQ.
Dennis Gorelik on June 30, 2009 7:37 AMMatt's right; you seem to have shot your thesis to hell by having identical results after removing Linq from the equation. It looks like you want to blame Linq, realize you can't, then just throw up your hands and say "it's all just bad abstractions!"
It looks to me like the problem is in the SQL engine. I'd be interested to see the timing of the same operations in different SQL engines (SQL Server, Oracle, MySQL, PostgreSQL), controlling for syntax and feature variations between them.
By the way, the ORM I'm most familiar with is ActiveRecord, which is good about making the common things simple while allowing (or encouraging) dropping to SQL for the more complex things.
I hate to nag, but I think you misunderstood Joel's point. I think he was getting at the very fact you pointed out: you have to be careful about treating abstractions as if they're really as clean as they look. You really need to understand how it meshes with at least a layer or two up and down to avoid bad algorithms and performance-sapping holes.
Aurrin on June 30, 2009 7:50 AM@Dennis,
Can you provide any evidence that Microsoft is abandoning LINQ? I know they are combining the LINQ-to-SQL and the EF teams, but I have not heard anything about the abandonment of LINQ.
Nathan on June 30, 2009 7:55 AMam i the only one here grossed out by this? if i had my DBA hat on, i'd also be carrying a DBA Knife to kill you in your sleep.
i understand the programmer allure, but damn... * selects by default?
janos erdelyi on June 30, 2009 7:58 AMCan't you just select just the ID in the linq2sql query?
var posts =
(from p in DB.Posts
where
p.PostTypeId == PostTypeId.Question &&
p.DeletionDate == null &&
p.Score >= minscore
orderby p.LastActivityDate descending
select p.Id). // <-- magic here, select only the ID
Take(maxposts);
If you want perfect abstractions, start programming this: http://opa.yale.edu/news/article.aspx?id=6764
lc on June 30, 2009 8:00 AMI don't think the statement: "All non-trivial abstractions, to some degree, are leaky." is true.
Some abstractions are ill-fitting and leaky, but not all. You can, for instance, use C to complete any work done in assembler. There may actually be a few OS specific usages, but encapsulated they don't leak out past the OS. With a higher abstraction like C, you can get more work accomplished (and it is more portable).
Regular expressions are another example of something that fully covers their expression space. Of course, if you try to go beyond their expressive capabilities and attempt to do something crazy like write a C compiler entirely out of REs, you will fail. But that's not leaking, it is just misuse, somewhat equivalent to using a hammer to pound in screws.
Paul.
Paul W. Homer on June 30, 2009 8:00 AMWhy not just use SQL? It's much more straightforward. I don't understand the desire to rewrite the SQL to some object notation. All it does is add unnecessary overhead to the application. Is looping through a SQL query really all that difficult?
Practicality on June 30, 2009 8:01 AMYou always have to understand all the details behind an abstraction to use it well. Java resource leaks are a really good example of this. "You don't have to allocate your own memory!" but you better understand how you are using it.
I don't think this means an abstraction has failed, it just means you can't pretend complexity doesn't exist.
asdf on June 30, 2009 8:03 AMIn my experience Linq doesn't generate * selects if you made a proper DBML, but I can't ascertain that for sure right now.
However, if you want to retrieve only the IDs, LINQ2SQL gives you a couple of pretty good options. You can either use a stored procedure method, in which case your SQL gets validated by the database, or you can just create a smaller PostStub class only with the bits that you need.
As far as abstractions go, LINQ2SQL is awesome. It's not complex enough to be hard to manage, and it's still got enough features that you can massage it into doing whatever you want in a fairly elegant way. What else could you want?
Ilia Jerebtsov on June 30, 2009 8:04 AMAlso a good framework just tries to conform to the 90/10 rule. It aught to easily solve the problem 90% of the time and ensure that the other 10% of the time it's still possible to solve the problem even if awkward or hard.
Justin Chase on June 30, 2009 8:05 AMJeff, first off I just wanted to say that I've been following your blog for a while now and I've enjoyed your articles. Keep up the good work.
As for the matter at hand, I won't argue the leaky abstraction part, but I do question your implementation. Why not have the top 48 questions cached in memory on the application server? Then have a background thread that updates that list every N seconds? Is it really so important that every user request get the exact recent 48 questions at that moment? Can't you live with a little latency in the name of performance? Can the user not live with a list that is 3 seconds out of date (hell it might be out of date by the time is takes to service his/her request anyways)?
I agree that, looked at closely enough, all abstractions leak. However, there may be other approaches to the problem that allow you to sidestep that leak.
rzezeski on June 30, 2009 8:07 AMI think the native integer size of the processor would affect the bignum class, in that operations that fit within the native size would run more quickly than those that don't. Here's a test, in Python:
bignum = (2**20)
count = 10000000
while count:
x = bignum + 1
count -= 1
This takes about 3.5 seconds on my computer for all bignum values up to 2**30. If you set bignum to be 2**31, it takes closer to 5 seconds, and continues to take about that long as you increase bignum. Python's arbitrary-size integer support is a leaky abstraction, in that it's slower to do anything with large numbers than it is with small numbers.
@janos erdelyi
Indeed.
What is actually going on behind the scenes in SQL server when you SELECT * is an initial query to the master DB system tables to get references to all the columns in your specified table which is then used to build the explicit list of columns for the query you meant.
Now, I expect this process is more than just a bit optimized - SELECT * is after all a pretty ubiquitous shortcut, but it is still a shortcut and hence has a cost, and it's one of the first things any performance-obsessed DBA (i.e all of 'em!) will tell you to stop doing.
As to why just getting the ID column is quicker, that's because for most common SQL server DB designs an ID/autonumber field serves as both the primary key and the clustered index for that table, which is going to get massively optimized as well.
I find for any complex queries across very large data sets the only acceptable solution is to *first* fetch the set of IDs/keys that match the filter criteria, stick them in a temporary table and then join that on the named list of specific columns you need to SELECT from the original table, and, more likely, any other related ones too.
And ideally this should all be done within a stored procedure so you get the benefits of pre-compilation of the SQL query plan as well as not having to make multiple network calls from web to DB server.
I have no idea whether LINQ supports the column/field mapping + editor integration with stored procedure outputs, but if it doesn't I won't even consider it.
Xiani on June 30, 2009 8:15 AMI don't see anyone else commenting on this, but why call ToList on an IQueryable object? IQueryable implements IEnumerable, so calling ToList basically forces you to process the list of results twice:
* Once to convert the result to a List
* Once to process the items in the for loop
Since this is an article aimed at performance, wouldn't it be a quick (if somewhat trivial) win?
Adrian Anttila on June 30, 2009 8:19 AMMost abstractions have a tradeoff. Often the cost is less flexibility and lower performance. But what you get is usually productivity, reliability, safety, etc. Over time, though, the tradeoff goes from being a bad idea in all but trivial situations up to the status quo. You seem to have left out the fact that SQL itself is an abstraction and is even used as an example in the linked Joel Spolsky article!
"The SQL language is meant to abstract away the procedural steps that are needed to query a database, instead allowing you to define merely what you want and let the database figure out the procedural steps to query it. But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify "where a=b and b=c and a=c" than if you only specify "where a=b and b=c" even though the result set is the same. You're not supposed to have to care about the procedure, only the specification. But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster."
But the same is true about most programming languages. Look at C. Originally, programmers still dipped down to assembly for the most intensive code. That started to happen less over time as computers got faster. Then people started building on C, and C was the new "lowest level". It happens all the time.
Your conclusion shows that you and I aren't actually in disagreement. Some leaks get fixed, some leaks stop mattering. Over time, some abstractions get thrown away, and others get built on with more abstractions. What I will disagree with you on, is your title which I'm sure was just intended to be a little inflammatory. Just because it leaks, does not mean it failed.
Russ on June 30, 2009 8:20 AMIt still seems to me that the cons of this particular abstraction outweigh the pros.
Practicality on June 30, 2009 8:27 AMJeff, what the hell?
You're getting a "select *" in your SQL because you used the IDENTICAL "select p" in Linq. Change the Linq to "select p.Id" and you'll get just a "select Id" in the SQL.
This isn't a leaky/failed abstraction at all. Please tell me that you actually knew this all along, but just chose a really terrible example to make your point?
I get the whole thing about indexing and datatype sizes and yaddayadda, but if that is your point, then you have the exact same problem writing actual SQL without the abstraction. You can only deal with those other issues if you have control over the database design itself.
Aaron G on June 30, 2009 8:29 AMWhile I agree the database is the main suspect, I don't think it is guilty of a performance offence.
During the second example, the database might have cached the row ids from the first query knowing you were going to ask for the full rows in the next query.
It might not have the rows already in its hands, but it knew exactly where to retrieve them from the disks.
Disclaimer: I'm not a database developer, but it is what I'd do if I was one.
Bruno Laturner on June 30, 2009 8:29 AMWhile I agree the database is the main suspect, I don't think it is guilty of a performance offence.
During the second example, the database might have cached the row ids from the first query knowing you were going to ask for the full rows in the next query.
It might not have the rows already in its hands, but it knew exactly where to retrieve them from the disks.
Disclaimer: I'm not a database developer, but it is what I'd do if I was one.
Bruno Laturner on June 30, 2009 8:29 AMAbsolutely agree with the core statement here: Abstractions are great until you hit their limits.
Abstractions are great for what they're there for: Making code much easier to read and write. Yet as the demands on the machine you need increasingly precise movements of your 1s and 0s (or a whole lot more/better machinery to move them with.) If your abstraction doesn't move your 1s and 0s as precisely as you need them (as is the case here) you need to drop down a level and get closer to them.
Eventually we will hit one boundary that we won't be able to abstract away: Electrons. :)
Drew on June 30, 2009 8:30 AM"Selecting all columns with the star (*) operator is expensive, and that's what LINQ to SQL always does by default."
Um, not so much Jeff. LINQ to SQL selects exactly what you ask it to select by default. If you change the second-to-last line of your initial query from "select p)." to "select p.Id).", LINQ to SQL will just select the Ids.
Saying that LINQ to SQL selects all columns with the * operator by default is no different than saying SQL itself selects all columns with the * operator by default. Which is to say, it's absurd. If you write "select p" you'll get all columns using the * operator, just as if you write "select *" in native SQL. In neither case is it the default, it's merely what you asked for.
Brent on June 30, 2009 8:31 AMThe real problem is finding where the abstraction leaks. If you use a profiler along with it you only have to change your LINQ querys with SQL in the parts of your code where it really slows down everything.
Is LINQ really wroth it? Seriously, it is whole another language to learn and to use, in the end you still have to know SQL and know it well in order to optimize the slow parts of your code.
Hoffmann on June 30, 2009 8:34 AMJustin Chase is correct. All you need to do is project the data. I am not sure is saying
select category.ID
will do the projection, but i know that saying
select new { category.ID }
will project the data and give you what you want. Abstractions leak the most when you don't spend the time to understand how the system works.
Darren Kopp on June 30, 2009 8:37 AM@Hoffmann: "Is LINQ really wroth it? Seriously, it is whole another language to learn and to use"
It's important to note we're talking about LINQ-to-SQL here, not just any LINQ to variant.
I don't think it's accurate to describe LINQ-to-SQL as a "whole another language" - it's a bit of syntax to the .NET languages. It's not like, say, sitting down and learning Ruby. If you already know SQL, even a little bit, learning LINQ-to-SQL isn't a big deal.
Marcel Lamothe on June 30, 2009 8:38 AMI see you do
foreach (var post in posts.ToList())
in your code. Why do you call .ToList() before running over the results?
configurator on June 30, 2009 8:38 AM@Dennis:
Fail. Microsoft is not abandoning LINQ.
What's the point of using LINQ? When you learn to write code, you'll appreciate the difference between compile-time and run-time errors.
Seems appropriate to share this link on the topic of abstractions: http://cuiltheory.wikidot.com/
Francesco Rizzi on June 30, 2009 8:39 AM@Bruno you're absolutely right, there is caching, at very least in the sense of "Oh i know where that is!".
I wrote a DB abstraction library based on LINQ, but written for PHP, and the performance compared to straight sql is HORRENDOUS (10ms using the lib, vs 0.2ms with mysql_query($query)). I'm working on improving it and PHP 5.3.0 will go a long way to help with that. But even with that performance hit, I so greatly prefer manipulating objects and calling functions than dealing with arrays and piecing together sql query strings.
Ryan Tenney on June 30, 2009 8:45 AM@janos erdelyi
Yeah I find it totally creepy too. Database already has an abstraction layer - stored procedures, NOT written by procedural programmers who don't know crap about how to treat a database. I'm working with an application right now which does nothing but client-side dynamic SQL, and the performance is horrible, it is prone to errors, it is capable of screwing up data integrity, and overall it is extremely buggy. Why? Because the people who wrote all the SQL don't know crap about SQL. LINQ can not be good for this situation.
Jasmine on June 30, 2009 8:49 AMIf you think LinqToSql is bad then what do you think about developers that make up for their poor SQL skills by using SQL CLR functions?
...not that I'd know any developers that bad or anything... uh.... like clearly not me.... :)
webdev_hb on June 30, 2009 8:54 AM@Dennis
Microsoft is not abandoning LINQ. Nor are they completely abandoning LINQ to SQL, at least not yet. They have stated that they will focus the majority of their efforts on the Entity Framework instead, which (in theory) will have the same capabilities and usage pattern as LINQ to SQL, plus a lot more.
As "the point of dealing with LINQ", it is that you can write queries that the compiler understands, instead of writing an opaque blob. Yes, you will probably still have to understand SQL in order to get the most out of it, but you don't have to mix your C# code with your SQL code, which leads to a more maintainable system.
I think this is a poor argument as presented, but I think there is still a point to be made. The problems you have seem (to me) to be with declarative languages in general. The problems you have in your example exist in SQL alone so it's wrong to point the finger at LINQ (as others have stated).
To the point you could make, though, all of what you say is true regarding SQL. Since there is a "shroud" (call it abstraction if you will) of sorts in every declarative language (no way to say exactly how to do things - just what you want), you will always have the behavior mentioned above.
The reason you are more forgiving to SQL is you have learned enough about the query engine to have *some* say in *how* the query will indeed run.
If I had to guess, your beef in general, is with declarative languages.
My $.02
Russ on June 30, 2009 9:01 AMIt's unfortunate that legions of developers will scan the post, not read the corrections in the comments, and will endlessly parrot "spolsky and atwood said not to use LINQ to SQL".
As several other people commented, if you only want certain fields, you should select them. I think the point you were getting at was that the ORM should figure out what fields you're eventually going to use, but that seems like it might set off some NP alarms.
Jon Galloway on June 30, 2009 9:19 AM> we programmers keep piling up these leaky abstractions, shoring up as best we can, desperately attempting to stay ahead of the endlessly rising waters of complexity
I find the use of the word "piling" dangerous here. As in, too many coders are afraid to go back and readjust the "lower level" abstraction in the name of "frozen API", instead opting for the little cozy hack that one day will [this part removed due to excessive swearing].
It is important to always strive to make abstractions better: ofcourse not changing everything every three minutes, but keeping things up-to-date
Simone Deponti on June 30, 2009 9:19 AMYou've seen Django ORM, right? If not, check it out: http://docs.djangoproject.com/en/dev/#the-model-layer
Beats LINQ big time SQL-wise.
Anonymous on June 30, 2009 9:22 AMI question why Jeff is using Linq at Stack Overflow at all?
Is Stack Overflow ever going to run on a non-sql server platform?
I would only consider Linq if we needed to support more than one DB platform. Running your own site on a specific platform doesn't meet that criteria. In that case you are better off writing Stored Proc and maximizing database performance and dealing with SQL and the RDMS engine directly.
I love LINQ. I have had SQL queries that were very complex and would take quite a bit to wrap my brain around come through much more optimized with LINQ. The issues here isn't LINQ = minscore
orderby p.LastActivityDate descending
select p.Id).
Take(maxposts);
loop over those postIds, or use some method I can't think of right now to turn it into a list of the correct variety (again I'm not typign this in Visual Studio)
Then:
var posts = from p in DB.Posts where postIds.Contain(p.Id) select p;
Kearns on June 30, 2009 9:47 AMThere is no need to put ToList() in your code samples, just foreach over the IQueryable object.
If you just want the ID's tell LINQ to SQL and it will do that for you too.
e.g.
var posts =
(from p in DB.Posts
where
p.PostTypeId == PostTypeId.Question &&
p.DeletionDate == null &&
p.Score >= minscore
orderby p.LastActivityDate descending
select p.ID).
Take(maxposts);
As to the performance figures of issuing multiple queries instead of one in my experience this has performed poorly. I wonder what exactly is going on with your database...
[)amien
@Jeff
I agree that all abstractions are leaky. However, I disagree that "leak" == "fail". The point of abstractions is NOT to make the underlying mechanism completely irrelevant; the point is to allow us to focus on the task instead of the mechanism as much as possible, and to accomplish the task more easily and more safely than is possible with the underlying mechanism.
LINQ to SQL allows me to focus on the task I am working on instead of the mechanism by which I am getting my data out of the database and into objects. It makes it easy and relatively safe to work with data, which is the whole point. Therefore it has succeeded in its goal. The fact that it is not a perfect abstraction is irrelevant, especially given that there is no such thing, as Joel pointed out in the first quote.
David Nelson on June 30, 2009 9:51 AMEven if an abstraction is horribly leaky, it's still useful for communication purposes. If you have 10 people on your team that speak English and one that speaks English and Hindi, then that one person will translate for the others. If you're translating a foreign concept, some foreign words might leak out, but having 95% of that translation in a form you understand makes that leakiness easier to handle.
[Aside: No one makes the argument that everyone must speak English and Hindi. But if you make the same argument that 10 developers speak C# and one speaks C# and SQL, the masses will proclaim that everyone should speak C# and SQL.]
It's the same argument when you talk about customer documentation: The document is an abstraction for the application and the code behind it, it's completely inadequate and inaccurate, but easier to deal with from the customer point of view. Oftentimes developer speak leaks out - dialog, server, xml, but it's usually tolerable, because an attempt has been made to abstract away the gritty details (i.e. Be thankful I didn't drop the source code in your lap and run).
SteveJ on June 30, 2009 9:55 AMSo Jeff with your Wide, Wide World Of Sports comment maybe you should retitle this article to Blazing SQLs
Doug on June 30, 2009 10:00 AMIt seems unfair to me that you criticize abstractions because of their lack of performance optimizations. That's like attacking an automatic transmission for not having the performance of a stick. It's a miracle the automatic transmission even approaches the performance of a stick.
Abstractions are all about functionality. You get the functionality without having to delve into the details. They are not about performance other than not being too awful.
This is just the nature of things. You can go play some pickup basketball with friends and not have to think into the details of what you are doing. But if you want to get the best possible performance as a player you have to get into all the little details of how you do every move and do specific exercises to tune your body for those moves.
Tolerable performance issues do not make an abstraction leaky or a failure. Suboptimal performance is not an issue that abstractions are intended to address.
Matthew Kane on June 30, 2009 10:02 AMAbstractions are definitely the most important bit of programming, and are not as the title of this post implies, failed in any general way what so ever. The only way we can accomplish anything within a reasonable amount of time is by leveraging and learning various levels of abstraction. Else, you should resort to programming machine code. Oh wait, machine code is an abstraction aswell. How about carefully manipulating electrical signals instead?
tedeh on June 30, 2009 10:10 AMLinq to SQL is an abstraction? Not by my definition of abstraction. It's a domain specific embedded language that allows you to write SQL in C# - the C# AST fragment that the LINQ expression is translated to has a one-to-one correspondence with the SQL that is generated.
Stuart Dootson on June 30, 2009 10:19 AMre: MS and L2Sql.
While Microsoft hasn't come right out and said Linq to SQL is dead, they did the next best (worst?) thing by saying that they were going to focus all their efforts on EF. After making that statement, we quickly abandoned any thought of using L2Sql on our development projects.
Andrew on June 30, 2009 10:21 AMYou are all missing the point.
LINQ doesn't make things slower. Doing silly things in database queries makes things slower. The point is that LINQ is a failed abstraction because you STILL have to know what things you aren't supposed to do in a relational database if you want performance; so what is it abstracting?
Nicolas on June 30, 2009 10:22 AM@Jon Galloway,
Actually, within the Rails world, there is a plugin that monitors how ActiveRecord ORM is used by the application and the modifies the ActiveRecord calls (and thereby the SQL it outputs) to better suit that.
For example, if you start with Select * from Posts, but you only use ID, Body, Title, the plugin will modify the AR statement to only select those fields from Posts.
As for the article, n+1 queries will never be faster than n query selecting the records you want. It's just silly saying otherwise.
Srdjan on June 30, 2009 10:29 AMDid you clear input buffers and procedure caches before running each of these? The results don't make any sense, unless there's a pathologically bad query plan for statement 1. Since that's doubtful - the query is pretty simple - I'd guess you didn't account for cached reads.
Common sense should tell you that
SELECT * FROM Table WHERE Condition
is faster than
SELECT * FROM Table WHERE Id IN (SELECT Id FROM Table WHERE Condition)
So much so, in fact, that SQL may even optimize the 2nd case into the 1st case.
Do me a favor, and run SET STATISTICS IO ON before running your queries. Then take a look at the number of IO's required for each scenario.
Why compare the performance of SELECT * against SELECT Id? Of course SELECT Id is faster, it only has one column to deal with that's probably the clustered index. Why not compare SELECT * against SELECT Id, Title, etc. If you need to populate a collection of Post objects (and not just get the Ids) that comparison would expose whether or not L2S is just being lazy with SELECT * or if it's really no different performance-wise than selecting all the columns.
John Sheehan on June 30, 2009 10:38 AMWhy compare the performance of SELECT * against SELECT Id? Of course SELECT Id is faster, it only has one column to deal with that's probably the clustered index. Why not compare SELECT * against SELECT Id, Title, etc. If you need to populate a collection of Post objects (and not just get the Ids) that comparison would expose whether or not L2S is just being lazy with SELECT * or if it's really no different performance-wise than selecting all the columns.
John Sheehan on June 30, 2009 10:38 AMEverything you do is an abstraction. C# is an abstraction of the CLI. The CLI is an abstraction of C. C is an abstraction of assembler. Assembler is an abstraction of the microcode. If you want no abstractions, the only option is a field programmable gate array. Yes, they are fast. The point is that you need to decide whether the performance hit and leaks are worth the reduced programming effort.
Grant Johnson on June 30, 2009 10:43 AMThe abstraction argument is so pathetic (IMHO).
What is sql, c#, filesystems, etc except one big fat abstraction after another on top of ones and zeros.
damon on June 30, 2009 10:53 AMI think the use of the word 'failed' is an exaggeration. I know what you're meaning to say and I agree though.
I've been having a lot of fun* engineering some abstractions for typography: fonts and text rendering, oh my. I need 1 abstraction that can cover both GDI and WPF's typography systems. It's not fun, and it is leaky or incomplete, but it will work and be beneficial. That is, until the next version when I decide I need to pivot the topology of the whole API just to enable some important rendering feature or increase reliability/performance.
* not
Rick Brewster on June 30, 2009 10:53 AMFrom reading the comments it seems that a lot of people don't realize what it means for an abstraction to be "leaky".
An abstraction is a simplification or reduction in the rules of an underlying system.
For an abstraction to be leaky simply means that there will be some cases where actions that, according to the abstraction, are equivalent (i.e. they should be exactly the same thing) but will behave differently. Because the abstraction defines them as equivalent but they aren't when actually used, they demonstrate that the abstraction is not "reality", but only a simplification of it.
If an abstraction were not leaky - if all actions that are equivalent according to the abstraction behaved identically - then it would be indistinguishable from the underlying system and thus considered to be equivalent to it (which by definition, is not abstract).
So all abstractions are leaky by definition, because if they aren't leaky then they include all rules of the underlying system, which makes them the same as the underlying system.
Another affect of this is that the more you reduce the "leakiness" of an abstraction, the closer and closer it reflects the underlying system, and thus the more complicated and less useful it becomes as an abstraction - since the purpose of an abstraction is to simplify the underlying system.
Personally, I feel that the ideal solution is to understand the underlying system, and to define multiple abstractions that each handle a subset of the underlying system, but that together have 99% coverage. This results in smaller abstractions that can remain relatively simple, and the remaining 1% can be handled by directly using the underlying system (which will be complicated and a pain in the butt, but still possible).
JohnC on June 30, 2009 11:06 AMJeff,
one more year of shallowness on your blog and you will be bought out by Comedy Central.
Have you actually read and understood what you wrote? It is complete and utter nonsense.
First, you are comparing apples and oranges. Second, you do not provide a convincing scenario for using LINQ.
In your previous post, you are praising Apple and iPhone 3G - apparently not knowing anything about simple and elegant abstractions *across* APIs on Apple platform. Isn't this some sort of hint, if not a full proof, that simple abstractions work well?
securityhorror on June 30, 2009 11:06 AMYou're not comparing apples to apples. You asked LINQ for the entire object, so it selected everything. Ask it for the ID, and it selects the ID. The SQL generated in this case is identical to the hand generated SQL.
Tim on June 30, 2009 11:24 AMJeff,
I suggest rewriting the article with examples comparing apples to apples. It makes no sense comparing the performance of a LINQ to SQL star (*) query against a single-column raw query (which, obviously, takes the lead).
It's so easy to do compiled-, single-column- and narrow queries with LINQ to SQL (and LINQ to EF, for that matter). Did you even bother to look into compiled queried? Ever heard of the First, FirstOrDefault or Single extension methods?
I generally enjoy reading your blog entries, but this one was a waste of time, and just plain wrong.
Anders Borum on June 30, 2009 11:27 AMI think some of the comments are wrong.
The goal of LINQ to SQL (and other ORMs) is not to keep the developer from having to learn SQL; learning SQL is just as important as learning your primary programming language (e.g. C#). Obviously, you don't have to, but it makes sense.
LINQ to SQL etc. provides a flexible CRUD mechanism that relieves the developer from maintaining a huge chunk of plumbing; who cares about plumbing such as CRUD tiers? Clients certainly don't care about it; they care about business value and there's (typically) little business value in a CRUD tier.
Anders Borum on June 30, 2009 11:34 AMWhy not make an attribute, affecting strings, that would make the compiler know the string contains an SQL query and it would check the completeness of the query, causing compile-time errors when an SQL clause goes awry?
Something like:
[SqlString(Connection="SqlConnectionString1")]
string query = "select * from foo.bar";
I'm sure we could complicate SqlStringAttribute further, providing schemas, SQL dialects, limitations and whatnot.
Gregor Petrin on June 30, 2009 11:37 AMJeff, you should have asked on Stackoverflow how to optimize your LINQ before you wrote this blogpost.
Nosredna on June 30, 2009 11:38 AMTo expand on the 'select new{p.Id}' comments, you can also avoid multiple db trips by selecting everything you need (not just the id):
(from p in db.Posts
where ...
select new PostListItem{Id = p.Id, Title=p.Title, ...})
.Take(48)
I like to strong-type the results, and draw a subtle distinction between the canonical domain class (ie Post) and a related type (PostListItem). One might even simplify with extension methods:
(from p in db.Posts
where ...
select p)
.Take(48)
.AsPostListItems()
Good post.
I won't go on about your example, that's been covered. I won't go on about the fact that everything is an abstraction, that's been covered too.
I would like to bring you up on "Like desperate citizens manning a dike in a category 5 storm ...".
May I suggest it is more like "Desperately juggling bowling balls, chainsaws and car batteries while trying to keep a dozen plates spinning on their poles during a force 10 gale of pompous criticism." :)
Paul
Paul on June 30, 2009 12:05 PMnot true. "All abstractions that you notice are failed abstractions"
the fact that the database stores blobs of data on disk, not tables or columns or indexes is an abstraction. This one you dont notice
The mirage that the disk really has blocks or files or anything except long streams of ones and zeros is an abstraction too.
THe mirage that the disk has long steams of ones and zeros is an abstraction - in fact it has only areas of varying magnetism spinning on platters at high speed . Unless of course its a USB drive, or a CD, or a ram disk.
pm100 on June 30, 2009 1:02 PMI'm no fan of LINQ, but, as other commenters already stated, your example is wrong. You need to select the ID field in your LINQ query.
I don't disagree that one needs to know everything about how their backing store to get the best performance. Given that, I see no reason at all to use LINQ.
If I have a DAL that can get me data into a DataTable in one function call, and that data can be passed up the stack (unlike the LINQ result, which can't, because it's an anonymous type), I don't see how LINQ buys me ANYTHING at all.
Someone mentioned using a background thread to refresh a cache. Wow, talk about swatting a fly with a cannon. How about just timing out the cache item and upon retrieval checking the timeout, and if it's expired, re-query the database, otherwise return the cache item. No other threads involved, no possible multi-threaded mistakes, etc. Easy-peasy. Of course, this assumes on actually designed a DAL and the DAL's API, and didn't just hack together a mess of code "to make it work."
Robert Barth on June 30, 2009 1:07 PMI agree with Joel's statement that all non-trivial abstractions are leaky in the sense that, by their very nature, abstractions hide details that are considered less important at that particular level. An abstraction has "leaked" the details that it abstracted away.[1]
I disagree, however, with your statement that all abstractions are "failed" abstractions, mostly because you've not defined failure. You've even provided a pragmatic point-of-view where abstractions are not failures (usability, and ease of understanding).
Basically, if an abstraction hides the details that we are concerned with, then it is our job as developers to choose a different (less abstract) abstraction. If you are concerned with query timing and LINQ doesn't give you fine-grained control over that, then LINQ is the wrong abstraction to use.
Abstractions only fail when they are wrong tool for the job, but in that case it's the developer that has failed.
[1] Of course, it *is* possible to prove that any given abstraction has a one-to-one correlation with what it is abstracting, thus making it a "leakless" abstraction. For instance, one might be able to prove that the SQL grammar is fully covered by the LINQ grammar. But in Joel's argument, I'll assume he considers such an abstraction to fall into the "trivial" category.
Very interesting post.
I am curious to know if you use the LINQ or the SQL blob method to implement the ruby on rails style "counter caches" for lets say "the number of votes on a answer" in stack overflow.
Doing it with LINQ you have to retrieve the parent answer row for the child votes rows and change the count on the retrieved answer object and then save it back while handling any concurrency issues.
On the other hand using a SQL blob just entails wrapping in a transaction, an insert into the votes table and an update to the vote count in the counter cache column of the parent answer row.
It seems like using LINQ would be much mode complicated and less efficient.
I've said it for a while, that what they should have done instead of Linq to SQL, is add support for inline SQL, like they have support for inline XML in the newest VB.Net. This way, you could have the compiler check the sql for you, and even have dynamic objects returned with the correct properties set up, but you wouldn't have to learn some backwards SQL language, and you could fine tune your queries exactly the same way you had been doing for decades.
Kibbee on June 30, 2009 1:47 PMGoal of every abstraction is "Reducing Complexity". The price we've to pay is "Implementation Overhead".
As long as "Overhead" are acceptable, abstractions are "Not leaky".
In another way why "Abstraction" can be "Leaky" is the "Level" in which they operate. Higher the "Level" more "Leaky". (LINQ vs TCP, one operating at very high level and another at very low level)
Murthy on June 30, 2009 2:18 PMThe essence of your argument is that Linq can sometimes be slow. The solution you pose is to increase the complexity of your code to make it quicker. To do this you propose getting into the guts of SQL Server, and the abstraction layers. However, there is a huge cost associated with doing that -- namely maintainability and complexity. The best code is the code that you don't have to write.
You are correct that slow pages loose users, however 404s and Web page exceptions loose a lot more.
The obvious solutions to the problem you state are:
1. Buy another server.
2. If that doesn't work, buy another one.
3. Repeat 1 and 2 until your problem goes away.
Servers are much cheaper than fixing even a few dozen lines of program code.
In the absence of this:
1. Profile your code to find the 1-5% that is the bottle neck.
2. Rewrite that database access as an SP.
3. Modify your Linq to use the SP.
Jeff, You need to be very careful about Top(n) queries because if you don't have the right index on the appropriate columns you have to scan the complete result set before you can sort it, and only then take your 'top' results.
Based on your post, and to second someones suggestion, why not just select the body column (rather than the Id as other's have suggested) - as that implies an extra step (getting the data that goes with that Id).
Tracing and Explain Plan are so easy in Sql Server that I would strongly recommend that you learn how to use them. As you are using Linq stick Sql Profiler running on your test DB instance and watch all the commands you are running against that DB. It's educational... especially as you will get to see any behind-the-scenes queries that Linq is doing - perhaps of the Data model.
Finally, therefore, I don't agree with you that the abstraction is leaky in the way you think (I don't know Linq well, but am pretty reasonable with SQL). You are just experiencing the joys of performance tuning! Just because it is slow to do string concatenation in C# (compared to Stringbuilder) does not make strings a bad abstraction for, ahem, strings; the implementation just makes it sub-optimal in certain circumstances :)
i agree that you cannot, shouldnot, as a developer just blindly use an abstraction and just expect it to always be the most optimal method for any case. While some abstractions are simple and I don't necessarily understand 100% but can be confident of being relatively optimal for most cases (e.g string.length method etc), something as extreme as abstracting your entire data access layer REQUIRES you to understand what its doing under the covers, or just live with any in-efficiencies that occur. I just don understand developers who implement some pattern or architecture and think that just because it makes sense and is fast for some cases, that it will always handle any situation in the most optimal matter.
Rob on June 30, 2009 2:52 PMUmmm, it's called "there's no free lunch". The trade-off for LINQ to SQL is less control, less speed, etc.
I hate it when programmers use ToList() when it isn't necessary. Then inevitably, other programmers see it on blogs like this and the next thing you know, it's everywhere.
Gr8Ray on June 30, 2009 3:11 PMI'd say the only failed abstractions are ones that either bringing nothing to table or don't hide any details. In your particular example, LINQ to SQL allows the compiler to help you identify query mistakes which increases productivity. The programmer is still free to identify pain points and bottlenecks and then address the particulars at the point of a problem. Solving existing problems over the long run is better than solving all those potential problems you don't have yet. Abstractions let us ignore details until they are important. That's their job.
MrPhil on June 30, 2009 3:26 PMTo the people who ask why they should use LINQ if it still requires understanding SQL:
The answer is that writing Linq-to-SQL code is much less work then alternative approaches (like creating/reading from SqlCommand objects).
I really don't get why this example was chosen, its pretty trivial to change the LINQ query to only grab the ID column.
fschwiet on June 30, 2009 3:40 PM@fschwiet: Once you've written a class that can read data from a stored procedure into a DataTable in a generic way, you'll never have to do it again (I haven't had to update mine in years). One function call and I've got data from the database. And I didn't have to use an anonymous type or that horrible lambda expression nonsense.
Robert Barth on June 30, 2009 5:21 PMAfter seeing LINQ used for database abstraction in a large project, I think it was a huge mistake (although LINQ expressions are very useful).
A better abstraction, I think, is to have a RESTful interface over HTTP that applications use. (The actual RESTful queries would not need any abstraction between it and SQL, and would be closely coupled to the schema.) But all applications downstream would essentially have a simple, clean, domain specific interface for manipulating data objects.
This doesn't solve all data problems -- certain schema changes would require changes to the RESTful interface. But if done right, I think this abstraction is about as leak-proof as you can get, when dealing with databases.
Jon on June 30, 2009 5:55 PMHigher-level abstractions don't have to gloss over the lower-level details. You can have the 'clean' abstractions, and you can _also_ have high-level abstractions for specifying the required lower-level details for performance.
That is, you don't have to deal with the lower-level details in a lower-level way.
James on June 30, 2009 11:44 PMHi,
as some of the commenters already pointed out there is a serious leakage in the reasoning behind the samples in this post, or to put it more bluntly THEY ARE WRONG. This serves as a trmendous disfavor to the creators of LINQ and all those developers who have been using it in their work (as in, "Hey, are you using this *crap* that Jeff denounced recently?")
So, I propose to you Jeff to create a new post and to apologise for your mistake and openly say that in fact LINQ is less leaky as you claimed.
Przemek (software developer at Goyello).
Anonymous on July 1, 2009 12:33 AMI've written my own C# Stored Procedure classes using generics, which for me, work very well. I simply create an instance of a stored procedure, add parameters, and call the execute method which returns an array of objects of the desired type.
e.g.
StoredProcedure sp = new StoredProcedure("get_users");
sp.AddParameter("@age", 21);
User[] users = sp.ExecuteWithReturnObjects();
I've spent no more than 10 minutes learning about Linq since it was launched, it looks far too messy for my liking. I'm sure it has it's advantages, but from what I've seen, I've no desire to learn it as I'm happy with my own classes.
Bry on July 1, 2009 1:06 AMI think you missed the point of Joel's original post. The valuable information I got from Joel's post wasn't that abstractions are doomed, worthless things because they can't perfectly hide, but instead that *you should be aware of what can go wrong*. If you use an abstraction, you need to know what's going on underneath to some extent, so you can deal with the inevitable leak.
What I find most amusing with this post is that you seem to have failed to grasp this from Joels post and so wrote your own version of it using an updated abstraction :)
workmad3 on July 1, 2009 1:30 AMThe whole point about abstractions is that they leave stuff out. This isn't "leaky" (which sounds like a fault) but the raison d'etre. In this case, the abstraction tells you what the result of your query should be, not how long it will take. If you're interested in speed (and good if you are), you simply can't expect SQL/LINQ to tell you.
William on July 1, 2009 1:36 AMI’m surprised that only very few people are interested in the actual SQL issue itself. It seems to me that there must be something wrong with the way Jeff measured the queries. My experience is that SQL Server (as any other modern database) is much smarter in most cases that we are, and that if you come up (according to you) with an optimal query plan, SQL Server usually does not perform worse, and usually it does something better. There are definitely edge cases, but simple queries like the ones Jeff is talking about, should be straightforward.
First, I hope that Jeff is intentionally omitting the ORDER BY clause from the simplified snippets:
select top 48 Id from Posts
select top 48 * from Posts
Without the ORDER BY, it’s completely different story. But since it’s so trivial, I assume that Jeff just omitted it for clarity.
What I don’t get mainly is the following. According to the Jeff's measurements, doing something like:
declare @ids table (id int PRIMARY KEY);
insert into @ids (id) select top 48 * from Posts order by SomeField;
select * from Posts in @ids;
should take the 404 ms. But, as I said before, it’s likely that SQL Server uses something along these lines (or pehrpas something more efficient) when executing this:
select top 48 * from Posts order by SomeField;
So there must be something fishy in the measurements:
1. How was the timing done? In SQL? In a client app?
2. What the time actually is? CPU? Entire query?
3. Jeff, did you make sure you used DBCC DROPCLEANBUFFERS before every test?
I would be also good to see query plans. I’m sure the explanation of this issue must be something simple.
Jan Zich on July 1, 2009 1:50 AM@Przemek and others: Chill. Jeff didn't say that LINQ should not be used. He didn't denounce it, rather the opposite.
So the lessons I see here is this:
1) LINQ2SQL is an abstraction in that it wants you to consider databases as object collections. It is leaky, because there exists cases where you need to know that this is not really true. That's the whole meaning of "leaky abstraction".
Note that it doesn't matter that you can write Linq-statements that gets better results, because in order to do so you need to bypass the abstraction that Linq provides. It's still leaky.
Also note that "leaky" is not an insult! SQL itself is also a leaky abstraction: sometimes you need to be aware that your data is not really all in RAM but resides on a disk, and that disks are spinning metal platters with reading arms that need to move.
2) The point of this is that you *should* use abstractions, since they are easier to use and often good enough, but you should also be aware of their limitations when you reach the edge cases.
Oh, and of course:
3) SQL is weird, it's counter-intuitive that top 48(*) would be slower than first getting just the IDs then looping over the result to get the rest.
Good post.
Console on July 1, 2009 2:03 AM@Console thank you for your explanation, I might have missed the point initially
Przemek on July 1, 2009 2:45 AMI heart stored procedures.
Pre-compilation has to be a bonus, business rules at the DB level ensures integrity across multiple apps, plus you can offload to your DBAs for performance tweaks.
I may be missing something though with regards to the benefits of OO access to a relational DB, maybe for smaller apps.
c on July 1, 2009 2:48 AM@JessicaBoxer:
"You are correct that slow pages loose users, however 404s and Web page exceptions loose a lot more.
The obvious solutions to the problem you state are:
1. Buy another server.
2. If that doesn't work, buy another one.
3. Repeat 1 and 2 until your problem goes away.
Servers are much cheaper than fixing even a few dozen lines of program code."
I understand what you mean and don't want to sound like an environmental extremist, but I really hate comments like this. Surely I can understand that one cannot spend hundreds of hours of tuning if a single extra server will do the same job. However, I would expect that people put effort and competence into tuning, and that buying a server is not the default solution. Learn to utilize the existing power of your hardware. If you screw up your SQL, fix it, don't buy a new server to cover up the performance problem, which in essence was a competence problem.
It is this whole mentality of "unlimited resources" and "commodity economics" that gets us humans into trouble. You clearly see it in operating systems too. My current hardware is hundreds of times more performant than 10 years ago, still as an end user I have to wait just as long. Capacity is wasted at an obscene level.
I will not bother to go into the environmental cost of it all, as I doubt anyone will care.
Ferdy on July 1, 2009 2:49 AM@Ferdy: I couldn't agree more.
Marcel Lamothe on July 1, 2009 5:03 AM@c (and others): Linq to SQL can be used with stored procedures.
Marcel Lamothe on July 1, 2009 5:06 AMBurrrrrn! The commentators really ate you up with this one and this time i actually have to agree with them.
Jazz on July 1, 2009 5:26 AMCall me a stick-in-the-mud if you want, but I confess I'm yet to be persuaded that LINQ and the entity framework are anything more than a stored-proc replacement written by those who don't fully understand databases. I've tried both ways on plenty of big projects and no way do I want to embed any more direct table access SQL in my app code than I absolutely have to - it just gets fragile and harder to keep track of far too quickly. I can live with and deal with the lack of typing far more easily, and if the SP is properly set up it's not going to crash at runtime due to a malformed query anyway is it?
It's got better since, but I remember getting very annoyed with ASP.Net 1.x, feeling like it was written by a VB team to address the problems of a beginner web dev - as an experienced web dev, it felt like it utterly missed the point and made some very silly base assumptions seem routine.
Would be nice to see a new .Net toy that felt like it was designed by the domain experts and not the interns.
Greg Webb on July 1, 2009 5:27 AMWell, duh. The devil always has been, and always will be, in the details.
There's no such thing as general code, or general actions. It's just a large morass of specific code and specific actions.
The minute you generalize or abstract, you're hiding relevant, and important details. They're both the same thing, really.
Cynoclast on July 1, 2009 5:43 AMThere's another problem with this article that no one else has noticed (based upon my skimming of the comments--I apologize in advance if I'm wrong).
There's a very good reason (and it has nothing to do with LINQ) that:
SELECT TOP 48 * FROM X ORDER BY ID
would be slower than
SELECT * FROM X WHERE ID = n
where n goes from 1 to 48.
The first query does more.
The second looped query makes an assumption (perhaps a valid one...I don't know) about the data that the first one can not. The assumption is that there are no holes.
what if there's no record with ID = 24? Then to get the top 48, you'll have to look at ID's 1 to 49. The first query does that. The second loop does not.
I have no problem with using a slower query (LINQ or otherwise) that's actually going to return all of my data, over using a faster one that isn't.
Chris J. Breisch on July 1, 2009 5:47 AMI think many of the criticisms of this entry are missing the point. That said, I'd like to add my own point:
The problem with LINQ is the same as that of any attempt at a "general" or "generic" interface to multiple database systems. You'd have the same sorts of issues using ODBC, Pythons ABI, JDBC, etc. Jeff is right that LINQ is a "leaky abstraction", but it's just the latest in a long series of them.
The real problem is that every database is different when it comes to performance optimization. The "select id" may be more efficient for many databases (and it should be), but you can't in general count on any query being optimal for all target databases. One example is that SQL Server and Informix like you to write several queries and link them with temporary tables, but Oracle much prefers you to use sub-queries aggressively and let the optimizer handle it.
So, if you want ease of programming, use LINQ, or ABI, or JDBC. If you want performance, use native SQL. Understand the optimizer. Try different queries. Resort to optimizer hints if necessary. And, of course, design the physical and logical database structures properly.
A. L. Flanagan on July 1, 2009 5:55 AMI believe LINQ to SQL is an unnecessary abstraction. It just adds another layer to your database and doesn't let you control how many calls you send to the database. Just remove it already and stick your SQL in stored procedures. You'll probably get 3x the speed just doing that.
Even Microsoft has relized LINQ to SQL is a lost cause.
http://oakleafblog.blogspot.com/2008/05/is-adonet-team-abandoning-linq-to-sql.html
I think LINQ is better suited for objects and xml. LINQ to Objects and LINQ to XML are both very good implementations.
Donny V on July 1, 2009 6:14 AM"We can't just pretend all our data is formatted into neat, orderly data structures sitting there in memory."
I don't really understand what this is supposed to mean. Of course your data is formatted in neat orderly data structures. In your case that happens to be inside the database, but either way the most efficient way to access it is to address the actual structure of that data.
Why aren't you using a stored procedure ?
Anonymous on July 1, 2009 6:43 AMI tried a lil test of my own which goes
select top 1000000 * from table
select top 1000000 ID from table
select * from table where ID in (select top 1000000 ID from table)
These took
118 seconds
8 seconds
175 seconds on the third query.
Rather the opposite to us presuming that getting 1 row takes 1 second, so getting 48 rows with an IN clause takes 48 seconds.
Anonymous on July 1, 2009 6:48 AMWhere is the NOLOCK behind the sql if we are talking about performance?
Mischa Kroon on July 1, 2009 7:09 AMIf all abstractions are failed abstractions, then we shouldn't be using HLLs at all! We should all be programming in C! Or, better yet, Assembler! Or machine code!
All I'm trying to say is that abstraction is good to a degree. Too much is bad, but languages that abstract the creation of assembler instructions are Good Things.
Anonymous on July 1, 2009 7:51 AMIf all abstractions are failed abstractions, then we shouldn't be using HLLs at all! We should all be programming in C! Or, better yet, Assembler! Or machine code!
All I'm trying to say is that abstraction is good to a degree. Too much is bad, but languages that abstract the creation of assembler instructions are Good Things.
Anonymous on July 1, 2009 7:52 AMJeff, frankly speaking for once, I think you are misguiding your readers. You ask for "Select P" and complain about that getting converted to "Select *". If you really wanted only 1 column or a few columns you would certainly use select p.Id or select new {p.Id,p.Name} etc.
Anonymous on July 1, 2009 8:08 AMFerdy wrote:
>However, I would expect that people put effort and
>competence into tuning, and that buying a server is
>not the default solution.
And you think there is not cost to the environment, energy and the human soul from doing that? Perhaps in the time saved debugging, and handling user complaints, the programmer could be out riding a boat to save the whales.
@JessicaBoxer: Perhaps the programmer could spend time becoming a better programmer!
Marcel Lamothe on July 1, 2009 8:35 AMGiven the fluid syntax, yet indirect control we have over the actual statements sent to the database, I approach LINQ to SQL with cautious optimism.
I see the relationship of LINQ to SQL vs. direct database programming as similar to C# vs. machine code. I trust that the C# compiler and the CLR JIT compiler will produce machine code from my C# code that is sufficiently performant such that the benefit of writing in C# far exceeds the cost of writing machine code directly.
And given that SQL statements are being compiled from my LINQ statements, the "select *" issue may very well be optimized out in the future, and likely with no change to the code I've written.
Craig Boland on July 1, 2009 8:42 AM@JessicaBoxer
Your point is funny but from personal experience there is a quite small limit to how often you can just "add a server" and see a real increase.
I recently upgraded our ERP server from a quad-core with 3 gigs of RAM and 3 disks to an dual quad-core with 32 gigs of RAM, 17 hard drives all 50% faster, separate physical RAID 10 Arrays for the OS, Applications, Database, Log Files, and Temporary database files.
The net result? A 5% increase in performance. Now I know this ERP system like the back of my hand (but didn't write it) and I spent a long time making sure that my hardware configuration exactly follows the vendor's so-called "best practices".
It's simply that the system it runs uses a poorly designed and very leaky database abstraction which makes poor assumptions about its physical environment. (you know like one of those academic papers, "assume the network is of infinite bandwidth and hard drive access is instantaneous").
Bottom line, we paid 15,000 Euros for a 5% performance increase, how many more times do you think any given company could afford to do this? Plus, even if I wanted to, there is a physical limit on how fast a server I can get.
The only way to fix the problem is to fix the abstraction, i.e rewrite the shitty code.
I'll admit, I've never used LINQ yet, but I was interested in possibly adapting it in the future. From reading this, it sounds like a revamp of old-school VB6 "Open Recordset" for people who can't quite get their heads into set-based operations instead of stepping through rows? If that's the case, it's not for me but I see the use.
However:
In the original example, where did the "Top 48" come from to be introduced into the LINQ sql statement? Did LINQ determine there were 48 records in the set to be returned, because I don't see how Jeff's parameters would have otherwise indicated he only wanted 48. Or was this snippet a pseudocode with the limit set outside the example shown? I would find it interesting if LINQ pre-calcuated it's own statistics, and would have to consider that maybe Microsoft launched a second SQL optimizer team within LINQ - which I would explore.
snarf! Using Joel's argument, we may as well go back to using an abacus because all improvements afterwards "leak".
Bah!
Writing correctly executing "slow" code is more important than writing incorrectly or poorly executing "fast" code. Write correct first, optimize where necessary later. Abstractions help us write correct code more easily.
Andrew on July 1, 2009 9:59 AMSo many comments by so many programmers that don't know the first thing about Linq, much less LinqToSql.
These programmers are leakier than the abstractions they complain about.
David B on July 1, 2009 10:37 AMI think LINQ sucks, always something that goes wrong so I use my Sqldatareader and push it in a generic list. Works everytime!!
code monkey on July 1, 2009 11:02 AM@Andrew
Snarf? Isn't that the little guy from ThunderCats?
Practicality on July 1, 2009 12:05 PMThe linq example is not literally identical to sql. An ORM has quite a bit more overhead than raw sql. Maybe the linq example happens to generate the same sql but aside from that they do pretty different things. Maybe I'm just being pedantic, but you did take the effort to bold and italicize the word.
Nicholas Wright on July 1, 2009 1:24 PMAs a developer, I simply cannot do my job unless I understand not just database structures but how the database works. To be honest, I need a good understanding of the underlying OS and storage layouts as well. This needs to be at a more than rudimentary level of knowledge and employers are beginning to realise that the best developers are those people who have a decent amount of experience in all these areas.
Abstractions will be at their most useful when we break down the silos between developers, DBAs and Admins and we are all able to understand exactly what it is the abstraction is hiding.
At this point, abstractions will still be leaky, we will just understand the leak and probably either avoid it or take advantage of it.
Jon on July 1, 2009 1:31 PM
Bit of a straw-man argument. Joel's warning that useful abstractions tend to be leaky was surely not intended to say "don't use abstractions".
Very apropos -- this sort of thing just came up on Stack Overflow. "Which key is more optimized?"
http://stackoverflow.com/questions/1058765/mysql-which-key-is-more-optimized/1058915#1058915
Ignoring the non-answer of "Test it and see"; in order to really start somewhere you have to KNOW how SQL, and in this case MySQL, indexes are constructed so you can generalize to an optimal answer.
This is the sort of thing that won't be easily abstracted away any time soon. Even Microsoft SQL Server's Index Tuning Wizard (now Database Engine Tuning Adviser) that has been in development for MANY YEARS often gets it godawfually wrong.
For the forseeable future developers will need knowledge of SQL DBMS fundamentals.
--
Matt
Hey Jeff...loved the iPhone post, didn't love this post.
LINQ to SQL (or better, yet, call it "LINQ to SQL Server" as that's all it is, kicks ass. It's just a nice fluent DSL over SQL. That's it. It's the thinnest of abstractions.
You really need to correct your code. I'd be happy to do a code review of StackOverflow with you and your guys.
* You don't need to call ToList(). Stop doing that and let IQueryable do its job.
* Use more projections and select new {} anonymous type or select p.ID, rather than select p.
Thanks!
Scott Hanselman on July 1, 2009 8:01 PMI don't like LINQ2SQL. If speed matters then the SqlDatareader will win every time. And why create all these new DSL:s - reason: because the old stuff is so boring to write!!! Good reason -( * NOT * !!!). I think a lot of these software methodlogies are crap, because one day you believe in Method A and the next the day it's Method B, you change your mind like you change underwear!
Now there's coming this Entity thing, and this NHibernate is here already and LINQ - but when you use the sqldatareader life is sweet and simple! It's fast, simple and reliable. Would you want it any other way?
I don't like LINQ2SQL. If speed matters then the SqlDatareader will win every time. And why create all these new DSL:s - reason: because the old stuff is so boring to write!!! Good reason -( * NOT * !!!). I think a lot of these software methodlogies are crap, because one day you believe in Method A and the next the day it's Method B, you change your mind like you change underwear!
Now there's coming this Entity thing, and this NHibernate is here already and LINQ - but when you use the sqldatareader life is sweet and simple! It's fast, simple and reliable. Would you want it any other way?
Abstractions tend to work better if one starts from concrete then abstract on-demand and refrain from going further until needed.
Anyhow, I think this LINQ problem could actually be a case of optimizing at the wrong level because moving a time consuming task out of time constrained path is simpler and longer lasting than making it fit now and having to revisit again when requirements change, usually negatively.
Don Park on July 1, 2009 11:47 PMTight Coupling is not the perfect answer but is a "simple" solution that works. I may not know a lot but I'm humble at least, instead of those arrogant and ignorant bastards who haven't a clue, but believes they are "jedi masters". And playing WOW doesn't make you a computer genius if you thought so!!!
Wanko on July 2, 2009 4:43 AMNot to pile on (which of course is exactly what I'm doing) but you can achieve select ID quite easily with Linq to SQL and I'm amazed you're using Linq to SQL and haven't realized that yet. There's a very strong, predictable correlation between your LINQ query and the resulting SQL.
Once you have a better understanding of Linq to SQL, then you might be able to comment intelligently on how leaky of an abstraction it is.
JohnOpincar on July 2, 2009 7:37 AM@ Xiani
"Now, I expect this process is more than just a bit optimized - SELECT * is after all a pretty ubiquitous shortcut, but it is still a shortcut and hence has a cost, and it's one of the first things any performance-obsessed DBA (i.e all of 'em!) will tell you to stop doing."
IF you are actually getting all the columns, I'd argue that "SELECT *" may be (depending on the DBMS implementation) a MORE performant method than "SELECT COL1, COL2, COL3, COL4, COL5".
SELECT *:
App passes in simple string.
DBMS parses SQL query string
DBMS sees "*" and translates that to a (cached) array of column references
SELECT [list of named columns]
App constructs longer string (either from a cached array of column references or stored in-memory as the intact SQL string)
DBMS parses SQL query string
DBMS sees "COL1" and translates that to a column reference
DBMS sees "COL2" and translates that to a column reference
...
The upside in the latter is that the application *can* rely on column positioning to get specific column values, but that leads to highly fragile code and should generally be avoided by the application programmer.
Not trying to pick nits or anything, but just wanted to point out that optimizing one "layer" of your stack by imposing draconian edicts on other layers isn't always a net performance gain.
Tom Dibble on July 2, 2009 9:53 AMTo add: ubiquitous shortcuts are generally a GOOD thing in an environment, as they can be special-cased for performance. Caching the '*' column reference array is a quick and easy special case optimization.
In general, when allowing for highly expressive environments, the more "shortcuts" you provide your developers the more opportunities you have to increase performance when your developers take advantage of those shortcuts. On the opposite perspective, when developing in a well-written system with prominent shortcuts, it is often better to USE the shortcut and thus take advantage of that special casing instead of trying to outthink the system designer. Cases where the above is NOT true should be highly visible in the documentation.
Tom Dibble on July 2, 2009 10:00 AM@janos erdelyi - "if i had my DBA hat on, i'd also be carrying a DBA Knife to kill you in your sleep."
They gave you a DBA Knife? All I got was the stupid Hat! I can see how the Knife would may so many things so much easier, though. Hmmm, I wonder if RedGate sells them...
(seriously, that's the funniest thing I've seen all week)
Great article (again). A willingness to consider abstractions to see if they make life easier, rather than a blanket "that abstraction is leaky so I won't use it" is essential.
See my post on this mindset in the context of business rules:
http://www.ebizq.net/blogs/decision_management/2009/07/business_rules_are_a_failed_ab.php
JT
James Taylor on July 2, 2009 10:39 AMLinq2Sql is great for rapid prototyping and throw-away scripts.
But if you have to play the performance game on high volume webservers, you better handtune and use sqldatareaders and other forms of optimizations.
Sometimes you have to import sql-data in your own datastructures to gain the needed performance. An sql-server itself is a big layer of abstraction, which of course often leads to performance-bottlenecks.
There is so much FUD regarding LINQ to SQL in the comments to this post that it would take Several paragraphs to correct them all. I won't try.
But - going unmentioned - is the use of lambda expressions rather than the pseudo SQL syntax. This allows the build up of complex queries in easy to understand syntax rather. Say goodbye to the typical joins from hell or procedural code in T-SQL - not the best place for it.
Also, the ability to use the SAME abstraction working over in memory collections, XML, and data from the database provide a comprehensive, consistent model for working with collections. This fact seems to be missing from the discussion.
Jeff - I would take up Mr Hanselman on his offer!
@Hanselman, @JohnOpinCar & others:
You have either missed the point entirely or you are just taking a strawman cheapshot at Jeff. Get down from your high horses and re-read what Jeff actually wrote.
Select p vs select p.id doesn't at all solve Jeffs problem. He wants all the columns. p.id is useful in that once you have found the underlying issue that the abstraction tries to hide you can work around it, but that's totally beside the point.
The point is that you have to bypass the abstraction in order to find the issue in the first place.
Jeffs claim that lin2sql generates "select *" by default is a valid complaint (if it is accurate, I assume it is). Why doesn't select p translate into "select [col1], [col2],.." etc? This is a very minor part of what Jeff wrote, though.
That it is possible to work around that with Linq is good in a way, as it means that the leaky abstraction has a band-aid.
If I want to, Linq lets me have to think about every sql weirdness but uses a different syntax that I have to learn - hooray...no wait, that's just pointless!
However, Linq lets you work with sql in an easy consistent way by abstracting some things for you. This works fine in most cases, which is why linq isn't pointless.
But is a LEAKY abstraction, and I think Jeff has made a good, valid point. This time...;-)
Console on July 4, 2009 3:39 AM"Linq to SQL is a good abstraction"
How so? I've never understood how mangling SQL into a new "Linq" structure is somehow an improvement. I love how it fits in to the program - but I don't see why the structure is so different to SQL.
It reminds me of back in 2002 when one of the VB.Net designers told me that the names of many object oriented features were re-named from C++/C#/standard terminology just so that it was "different". Their argument was if it wasn't different it wouldn't be a different language. Here we have it again with Linq where it is different from SQL just so that it is different. It would be great if they could have based it on SQL - that would have made sense.
So - is Linq's any good? That comes down to perspective. From a programmer's perspective it offers ease of programming. From database performance perspective it is a big failure. You point out only some of the performance issues - there are many other performance issues you don't list.
Linq doesn't allow cached server side execution, which can be huge for large data sets with complicated relationships. Index and other compiler hints, collation issues and more advanced statements can't be done (at least not to my knowledge).
I've found major slow downs specifically caused by Linq. I've done a couple of searches and the general rule of thumb seems to be that data access runs at 50% under Linq compared to not using Linq.
I'm programming for a system that has 4-500 concurrent users on a database with over 100,000 customers and many millions of transactions. So for me the difference can be substantial. A SQL stored procedure, properly tunned and cached, may run in 10 seconds while the same command issued through Linq may take a minute or more to run.
Maybe this wouldn't matter so much on a smaller system.
Philip on July 5, 2009 3:16 PMMicrosoft WANTS developers to use SQL; thye just recommend that it be done in a controlled manner: through stored procedures. LINQ struck those of us in the DBA realm as being completely contradictory to Microsoft's prior recommendations, and apparently there's a bit of a scism in Redmond about it.
In my opinion, it falls in the same category as the "4GL" tools of the 80's: it's a tool to allow poor programmers to write bad code faster.
Kris Cook on July 6, 2009 6:04 AM@Philip - LINQ2SQL != LINQ. LINQ is a generalised querying language for working with many different types of data in an identical manner, from in-memory collections to back-end databases. That's the whole point - make sure you know what you're criticising first.
LINQ2SQL allows for the use of Stored Procedures (see http://weblogs.asp.net/scottgu/archive/2007/08/16/linq-to-sql-part-6-retrieving-data-using-stored-procedures.aspx) if you want generation of query plans ahead of time etc.
Could you provide the links where a 50% speedup is proven for using raw data readers as opposed to LINQ? I'm betting that you can't - MS wouldn't release LINQ2SQL with anything like that sort of overhead.
Enough with the FUD. Learn about the technology before trying to shoot it down.
Oh and where's Jeff in this discussion? He seems to have gone very quiet. I would have thought he would at least have corrected the post after so many people pointed out his mistakes.
Dave R. on July 6, 2009 7:35 AMI know little about LINQ or SQL, but there seems to be a logical fallacy in this post.
The argument is that LINQ is less efficient than SQL, but to optimise the SQL you have to make 48 individual queries to extract 48 rows. Can this be done in a single SQL statement? If not, doesn't that make SQL as much of a "failed" abstraction as LINQ?
By the way, you never state what the time is for the actual 48 query option. Why not? You surely don't think the equation "(48 * 3 ms) + 260 ms = 404 ms", is realistic.
Phenwoods on July 6, 2009 11:35 AM@Stephen says...
> there is a quite small limit to how often you can
> just "add a server" and see a real increase.
Of course that is true. Generally speaking this problem is the inability to run operations in parallel due to interdependency. However, generally speaking database type applications are quite amenable to parallelization. This is due to the fact that reads are far more prevalent than writes (and the fact that SELECT is largely declarative rather than procedural.) Consequently, database performance can often be improved by adding read only servers behind a good load balancing scheme.
In any application there tends to be a few small bottlenecks, and you should apply your "fix the nasty code" process there. Note that often fixing bottlenecks introduces nasty code, of necessity. (For example, introducing code that is highly dependent on the implementation details of a particular server, or breaking the abstraction of the database with such things as nolock.)
> I recently upgraded our ERP server from a quad-core ...
> The net result? A 5% increase in performance.
Two things -- I would not have gone with a big iron server, but rather multiple smaller, simpler servers. Secondly, if that is the case your problem is most likely very nasty code with lots of unnecessary dependencies. This is the sort of code that is written by people who obsess about squeezing every ounce of performance out of a server, which usually means they do so with premature optimization, leading to horrible, nasty, unmaintainable, and tangled code.
I know nothing more about your situation that what you told me, but I would make an educated guess that that was your situation. I'd recommend that you build a quality test case suite, and then start refactoring the heck out of the code.
JessicaBoxer on July 6, 2009 1:20 PMI think it's because the underlying thing being abstracted is really general (e.g., a pencil, a microprocessor) and the abstraction is less general and more oriented towards a particular way of doing something (e.g., word processor, which works in lines of letters instead of line drawings, or shading, or whatever; higher-level language, which may not let you grab those bits and do things with and to them like you can with assembler), so you end up using the abstraction like a waldo or it's like using a wrench as a hammer: it feels horribly inefficient and wrong. The abstraction is more specialized. So you have to understand the underlying thing really really well and the idea of the abstraction really really well in order to push the abstraction around to do what you could do pretty easily with the underlying thing. I am thinking of a particular circa 1980s Jovial compiler ...
Dorothy on July 6, 2009 3:00 PM@David R.
Re-reading my post I can see that I should have been a lot more explicit.
Seeing as the topic was specifically about Linq to SQL I thought it was self evident that this was the subject. I like linq, and I'm not saying anything negative about it in general, just the use of SQL.
As for use of stored procedures through linq, and the use of compiled statemetns - they simply don't work anywhere near as fast. BUT - I was refering to linq statements - compiled or not. (I probably should have stated that).
Please don't assume that we are random users making comments about things we don't understand. I've been doing DBA on SQL for a decade so I'm capable of listinging to what others say and then doing my own tests to see what is correct. I LIKE LINQ (both generally and for SQL) - but it simply doesn't perform for me. I HOPE future versions do.
So - no FUD involved - these are tests I have done. I simply can not get LINQ to perform anywhere near the speed of other methods. Please try it yourself before condemming me as a FUD maker. I even explained my situation to make it clear that it may only be upper end users have have to worry about this.
Also, before you condem that we haven't learnt the techology it would pay for you to do some quick google searches to find out what the actual state of play is.
For instance,I did a google search and turned up something right away
http://www.devtoolshed.com/content/performance-benchmarks-linq-vs-sqldatareader-dataset-selects-part-1
http://www.devtoolshed.com/node/12
If "SELECT TOP 48 * " is really slower (instead of faster) than "SELECT TOP 48 id" plus 48* "SELECT * WHERE id=", something must be badly broken in your database. The second combination creates *by far* more work for the database. I can only guess the experienced difference is because of caching effects.
ammoQ on July 7, 2009 12:35 AMIMHO abstractions aren't fundamentally leaky, the humans who implement them are.
Eagius on July 7, 2009 12:24 PM@Philip - did you actually read the sites you linked to? The guy's tests are hardly real-world. He does 10000 SELECTs on a single 25-row table using three different methods. If you're re-running a single query 10000 times using LINQ without compiling your query, the setup costs are going to be repeated 10000 times and amplified.
If you're generalising that all LINQ2SQL queries exhibit a 50% slowdown based on this, then, yes, I'm going to accuse you of FUD, or at least lack of knowledge about the technology involved. And it really doesn't matter how many years you've done SQL - this argument is about LINQ2SQL, not T-SQL. I'm still waiting for the stats to support your argument - I'm guessing that you can't find them.
It would be interesting to see what sort of effect switching from LINQ2SQL to 'raw' datareaders would have on a site such as StackOverflow. But I'm betting that the developers are happy to pay a small penalty for the increase in productivity afforded to them.
Dave R. on July 8, 2009 5:19 AMI wonder, for your sample table / query, if nesting it works better.
SELECT * from table where ID in ( select top 48 id from table where [clauses] )
Theoretically gives the sql server a chance to optimize that it doesn't get from your independent queries, and gives some additional hint that you don't actually want everything from all entries then filtered to the top 48, but just everything from the top 48, which are subtly different.
Chris on July 8, 2009 11:09 AM@Dave R
I think we aren't entirely in disagreement.
I explicitly detailed the size of my data and stated that my findings may only be due to size of the data I have.
You accuse me of FUD in general, and I disagree. But I actually agree that I am generating FUD in relation to LINQ2SQL for large databases/data warehouses. You are right! I am clearly saying that it is an inappropriate technology for large databases (Doubt). I am clearly saying that it runs slower than other more appropriate technologies on larger databases (Fear). I am saying that the cost/benefit is unknown without the developer first doing benchmarking and testing (Uncertainty). YES - I am preaching FUD for Linq2sql for large databases!
I chose those links because they were some of the first ones I googled. I could probably find more - but I don't have the time to prove myself right when I have already done that ON MY DATA. For all in know LINQ2SQL may be faster in some situations, that's why I STILL test alternative options in order to get the desired performance.
You said the links only used a few rows, and that's a valid issue. So, did YOU repeat the tests on that page with extra data? Did you add in 100,000 to 100,000,000 records and then look at the results? If not then give it a go - all the code was there to do just that. You don’t have to believe the site author, you CAN do the test yourself on data that matches the size of tables you deal with.
Chances are you can't be bothered, and I don’t blame you. I only started benchmarking and testing when I first noticed massive slowdowns when using LINQ2SQL, which is what woke me up to the issue.
I actually agree that there is a balance between the benefits of LINQ2SQL. I DO use Linq2sql for small/quick tasks, the problem is that majority of my data tasks are over large volumes of data. If something takes at least 16 ms to run... you could make it 10 times slower and I'm not going to care. BUT if something takes at least 60 seconds to run, you can put money on the fact I will be doing everything I can to make sure that's all it takes to run.
You mentioned the “small” penalty of LINQ, and this is where the size of the data becomes important. For large data it isn't a small penalty to use LINQ, it is a large one.
To be very specific, I have one table where the current row count is 3,505,920, and another that is 22,381,736. These tables are accessed very frequently. I simply can't get the performance out of LINQ2SQL to justify its use.
I have several hundred concurrent users. I am about to have maybe 800 mobile vehicles that log their gps position every 30 seconds AND provide extra data and GPS positions during specific events - producing a mind blowing amount of daily data.
So for me the speed impact of LINQ matters. StackOverflow? I don't know - that's up to them to do their own technology comparisons and tests. For you? I don’t know.
If you take it that I am generating FUD – fair enough, but the fear, uncertainty and doubt can all be eliminated by some very simple benchmarks that data developers should be doing.
Philip on July 8, 2009 5:09 PM@Dave R
I think we aren't entirely in disagreement.
I explicitly detailed the size of my data and stated that my findings may only be due to size of the data I have.
You accuse me of FUD in general, and I disagree. But I actually agree that I am generating FUD in relation to LINQ2SQL for large databases/data warehouses. You are right! I am clearly saying that it is an inappropriate technology for large databases (Doubt). I am clearly saying that it runs slower than other more appropriate technologies on larger databases (Fear). I am saying that the cost/benefit is unknown without the developer first doing benchmarking and testing (Uncertainty). YES - I am preaching FUD for Linq2sql for large databases!
I chose those links because they were some of the first ones I googled. I could probably find more - but I don't have the time to prove myself right when I have already done that ON MY DATA. For all in know LINQ2SQL may be faster in some situations, that's why I STILL test alternative options in order to get the desired performance.
You said the links only used a few rows, and that's a valid issue. So, did YOU repeat the tests on that page with extra data? Did you add in 100,000 to 100,000,000 records and then look at the results? If not then give it a go - all the code was there to do just that. You don’t have to believe the site author, you CAN do the test yourself on data that matches the size of tables you deal with.
Chances are you can't be bothered, and I don’t blame you. I only started benchmarking and testing when I first noticed massive slowdowns when using LINQ2SQL, which is what woke me up to the issue.
I actually agree that there is a balance between the benefits of LINQ2SQL. I DO use Linq2sql for small/quick tasks, the problem is that majority of my data tasks are over large volumes of data. If something takes at least 16 ms to run... you could make it 10 times slower and I'm not going to care. BUT if something takes at least 60 seconds to run, you can put money on the fact I will be doing everything I can to make sure that's all it takes to run.
You mentioned the “small” penalty of LINQ, and this is where the size of the data becomes important. For large data it isn't a small penalty to use LINQ, it is a large one.
To be very specific, I have one table where the current row count is 3,505,920, and another that is 22,381,736. These tables are accessed very frequently. I simply can't get the performance out of LINQ2SQL to justify its use.
I have several hundred concurrent users. I am about to have maybe 800 mobile vehicles that log their gps position every 30 seconds AND provide extra data and GPS positions during specific events - producing a mind blowing amount of daily data.
So for me the speed impact of LINQ matters. StackOverflow? I don't know - that's up to them to do their own technology comparisons and tests. For you? I don’t know.
If you take it that I am generating FUD – fair enough, but the fear, uncertainty and doubt can all be eliminated by some very simple benchmarks that data developers should be doing.
This is my last post on this page - I'll come back to read if you want to post anything else, but I think I've made my case as clear as I'm able to.
Philip on July 8, 2009 5:42 PMLINQ Rules! I can't believe developers are still writing SQL Strings and using SQLCommands and SqlConnections. I've been able to develop two applications in the time it would take me to develop just one, if I weren't using LINQ.
sam on July 11, 2009 9:20 AMa lot of people miss the entire point of encapsulating data access with stored procedures - it's (also) security. DBA's (myself included) don't want user app's to have direct access to the tables. Using Linq requires that, so security is out the window. Using stored procedures can not only provide more efficient data access, but provide for a designed security layer.
-dennis
2 Dennis Parks: thank you! finally someone pointed this out! I was surprised it took so many posts..
Tomas B on July 15, 2009 5:00 PMWhile I place importance in the performance of any piece of code, I must admit that one of the driving forces in using heavily abstracted technologies -- that by their very nature add heaps of overhead -- is the client.
Clients are less interested in how properly coded their product is than the time-to-production.
I just went through an application that was using a vanilla datagrid to display a a paged result set, but the paging was being done client side and as a result a quarter hundred thousand records were being pulled from the database each page, while onlyl a hundred or so were being displayed.
My guess is that the original developer had not thought how many records there were going to be, instead focusing on the business logic around the abstracting control (the datagrid).
This abstraction was causing intermittent problems with resource usage (although I have not explicitly proved this, I think it is most likely), it had finally leaked.
Now this abstraction may have saved a few hours of coding but in the end it caused more hours of support work, and some user dissatisfaction.
It's not always clear when your abstraction will bite you in the @$$ at the end of the day, but it is important to know the pitfalls your abstraction will have, and when they may occur.
EnderWiggin on July 16, 2009 4:26 AMI just hate the phrase "shoring up." It is such a Palinism. And also (intended), I think you could have just linked to Joel's original article and been done with it.
And also (again, intended), there are many, many much better tools than LINQ-to-SQL. I think this stuff is just throwing stones at glass houses that both save more time and commit far less of these kinds of mistakes.
Charles on July 17, 2009 10:12 PM@Dennis Parks - LINQ2SQL can use stored procedures:
Dave R. on July 20, 2009 5:32 AMI'm really very useful to follow a long-time see this as a blog here Thank you for your valuable information.
OYuN on July 26, 2009 7:20 PMWhy not take your abstraction to the next level and sit your application code over an object model by using an ORM like JDO or JPA?
Then you get to use inheritance, polymorphism and interfaces (all forms of abstraction provided by all good OO languages) which are the crucial tools used to prevent code bloat in large applications.
You'll also find the amount of SQL code you have to write will drop to virtually nothing as the ORM does most of the mind numbingly, 'boring as batsh*t' SQL 'glue code' for you automatically - and yes, it even performs well, very well!
blinko on August 29, 2009 10:20 AMWhy would anyone want to use LINQ. I refuse to use it. I'm going to select data using real TSQL or PLSQL so I can get EXACTLY what I want and be done with it. Unbelievable how people will jump on the MS bandwagon for any new thing that comes out. Just makes me sick actually. I'm telling every potential employer the facts about LINQ - how it sucks so bad - and how I refuse to use it in my applications.
Why do you want to abstract something simple like SQL? Just makes no sense.
David on September 16, 2009 8:41 PMIt was a very nice idea! Just wanna say thank you for the information you have shared. Just continue writing this kind of post. I will be your loyal reader. Thanks again.
Wow gold on September 29, 2009 11:47 PM| Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |