While I may have mixed emotions toward LINQ to SQL, we've had great success with it on Stack Overflow. That's why I was surprised to read the following:
If you are building an ASP.NET web application that's going to get thousands of hits per hour, the execution overhead of Linq queries is going to consume too much CPU and make your site slow. There’s a runtime cost associated with each and every Linq Query you write. The queries are parsed and converted to a nice SQL Statement on every hit. It’s not done at compile time because there’s no way to figure out what you might be sending as the parameters in the queries during runtime.So, if you have common Linq to Sql statements like the following one ..
var query = from widget in dc.Widgets where widget.ID == id && widget.PageID == pageId select widget; var widget = query.SingleOrDefault();.. throughout your growing web application, you are soon going to have scalability nightmares.
J.D. Conley goes further:
So I dug into the call graph a bit and found out the code causing by far the most damage was the creation of the LINQ query object for every call! The actual round trip to the database paled in comparison.
I must admit, these results seem ... unbelievable. Querying the database is so slow (relative to typical code execution) that if you have to ask how long it will take, you can't afford it. I have a very hard time accepting the idea that dynamically parsing a Linq query would take longer than round-tripping to the database. Pretend I'm from Missouri: show me. Because I am unconvinced.
All of this is very curious, because Stack Overflow uses naive, uncompiled Linq queries on every page, and we are a top 1,000 website on the public internet by most accounts these days. We get a considerable amount of traffic; the last time I checked it was about 1.5 million pageviews per day. We go to great pains to make sure everything is as fast as we can. We're not as fast as we'd like to be yet, but I think we're doing a reasonable job so far. The journey is still very much underway -- we realize that overnight success takes years.
Anyway, Stack Overflow has dozens to hundreds of plain vanilla uncompiled Linq to SQL queries on every page. What we don't have is "scalability nightmares". CPU usage has been one of our least relevant constraints over the last two years as the site has grown. We've also heard from other development teams, multiple times, that Linq to SQL is "slow". But we've never been able to reproduce this even when armed with a profiler.
Quite the mystery.
Now, it's absolutely true that Linq to SQL has the performance peculiarity both posters are describing. We know that's true because Rico tells us so, and Rico ... well, Rico's the man.
In short the problem is that the basic Linq construction (we don’t really have to reach for a complex query to illustrate) results in repeated evaluations of the query if you ran the query more than once.Each execution builds the expression tree, and then builds the required SQL. In many cases all that will be different from one invocation to another is a single integer filtering parameter. Furthermore, any databinding code that we must emit via lightweight reflection will have to be jitted each time the query runs. Implicit caching of these objects seems problematic because we could never know what good policy is for such a cache -- only the user has the necessary knowledge.
It's fascinating stuff; you should read the whole series.
What's unfortunate about Linq in this scenario is that you're intentionally sacrificing something that any old and busted SQL database gives you for free. When you send a garden variety parameterized SQL query through to a traditional SQL database, it's hashed, then matched against existing cached query plans. The computational cost of pre-processing a given query is only paid the first time the database sees the new query. All subsequent runs of that same query use the cached query plan and skip the query evaluation. Not so in Linq to SQL. As Rico said, every single run of the Linq query is fully parsed every time it happens.
Now, there is a way to compile your Linq queries, but I personally find the syntax kind of ... ugly and contorted. You tell me:
Func<Northwinds, IQueryable<Orders>, int> q =
CompiledQuery.Compile<Northwinds, int, IQueryable<Orders>>
((Northwinds nw, int orderid) =>
from o in nw.Orders
where o.OrderId == orderid
select o );
Northwinds nw = new Northwinds(conn);
foreach (Orders o in q(nw, orderid))
{
}
Anyway, that's neither here nor there; we can confirm the performance penalty of failing to compile our queries ourselves. We recently wrote a one time conversion job against a simple 3 column table containing about 500,000 records. The meat of it looked like this:
db.PostTags.Where(t => t.PostId == this.Id).ToList();
Then we compared it with the SQL variant; note that this is also being auto-cast down to the handy PostTag object as well, so the only difference is whether or not the query itself is SQL.
db.ExecuteQuery( "select * from PostTags where PostId={0}", this.Id).ToList();
On an Intel Core 2 Quad running at 2.83 GHz, the former took 422 seconds while the latter took 275 seconds.
The penalty for failing to compile this query, across 500k iterations, was 147 seconds. Wow! That's 1.5 times slower! Man, only a BASIC programmer would be dumb enough to skip compiling all their Linq queries. But wait a second, no, wait 147 seconds. Let's do the math, even though I suck at it. Each uncompiled run of the query took less than one third of a millisecond longer.
At first I was worried that every Stack Overflow page was 1.5 times slower than it should be. But then I realized it's probably more realistic to make sure that any page we generate isn't doing 500 freakin' thousand queries! Have we found ourselves in the sad tragedy of micro-optimization theater ... again? I think we might have. Now I'm just depressed.
While it's arguably correct to say that every compiled Linq query (or for that matter, any compiled anything) will be faster, your decisions should be a bit more nuanced than compiled or bust. How much benefit you get out of compilation depends how many times you're doing it. Rico would be the first to point this out, and in fact he already has:
Testing 1 batches of 5000 selectsuncompiled 543.48 selects/sec compiled 925.75 selects/sec
Testing 5000 batches of 1 selects
uncompiled 546.03 selects/sec compiled 461.89 selects/sec
Have I mentioned that Rico is the man? Do you see the inversion here? Either you're doing 1 batch of 5000 queries, or 5000 batches of 1 query. One is dramatically faster when compiled; the other is actually a big honking net negative if you consider the developer time spent converting all those beautifully, wonderfully simple Linq queries to the contorted syntax necessary for compilation. Not to mention the implied code maintenance.
I'm a big fan of compiled languages. Even Facebook will tell you that PHP is about as half as fast as it should be on a good day with a tailwind. But compilation alone is not the entire performance story. Not even close. If you're compiling something -- whether it's PHP, a regular expression, or a Linq query, don't expect a silver bullet, or you may end up disappointed.
| [advertisement] JIRA 4 - Simplify issue tracking for everyone involved. Get started from $10 for 10 users. |
One of your best writings! And I do agree 100%.
Johann on March 19, 2010 5:34 AMIs this really a surprise? Most ORM tools will add overhead to your application, they simply aren't capable of any real query optimization.
Aside from the anonymous types, I'm not really sure there is any benefit here at all. Personally, I think your ADO.NET ExecuteQuery() is the easiest code to read. It also ensures that your object model evolves correctly, relying on an ORM to do this for you will inevitably lead to a spotty model.
Eric Sarjeant on March 19, 2010 5:35 AMExcellent review Jeff. I am glad that Linq-To-Sql worked for you. I couldn't even start because I didn't get Microsoft's Website-spark incentive, or any other incentive from them for that matter.
However I worked with mySql+ASP.Net MVC+jQuery stack and the performance seems quite wonderful even though I am crunching queries by the dozen every hour.
Stackoverflow's success is definitely awe-inspiring. The concept is excellent. That's why I would love to see more websites based on it out there. I am sure others will appear.
You've done a great job Jeff.
Cyril on March 19, 2010 5:36 AMYou use LINQ to SQL? I would have figured that your team would have wanted to move to the Entities Framework. Wouldn't their be a small benefit by changing to the Entities framework?
Sure you don't get the lazy load and other automated query generation things that LINQ to SQL provides, but the Entities framework is designed to mimic more of how SQL was designed.
We recently switched our product over to it and noticed some performance enhancements.
Mike on March 19, 2010 5:42 AMthere* (sorry no edit)
Another thought comes to mind, since precompiling makes sense for repetitious queries wouldn't it be safe to say that such framework elements of a site should use compiled queries?
I see for instances of sitemaps and queries that are not parameterized that would be the same across all users in the system that this would be a benefit.
Mike on March 19, 2010 5:45 AMWhat's the rough profile of Stackoverflow with respect to number of batches/queries per batch? I would image it's closer to a number of batches with one query each, hence further eliminating the benefit of compiled queries.
Izevaka on March 19, 2010 5:49 AMHey Now Jeff,
"scalability nightmares".?!? I didn't know you were a big fan of compiled languages. If S.O. is not as fast as it can be it's still is a stellar place to go.
Coding Horror Fan,
Catto
That's the Coding Horror I remember and miss!
Michael Hamrah on March 19, 2010 6:04 AM> All subsequent runs of that same query use the cached query plan and skip the query evaluation. Not so in Linq to SQL. As Rico said, every single run of the Linq query is fully parsed every time it happens.
You are confusing the query plan caching done by SQL Server with translating expression trees to SQL statements done in the client. The former is done for most LINQ to SQL queries too, as they translate to identical SQL statements and only the value of the parameters change. That means SQL Server will simply reuse the cached query plan for almost all LINQ to SQL queries. The latter has to be done for all uncompiled LINQ to SQL queries. Of course, this does not apply to raw SQL statements in the first place (they are already raw SQL). Considering you're talking about execution plan caching done by the database server for LINQ to SQL generated queries, those sentences are misleading and arguably wrong.
MehrdadA on March 19, 2010 6:14 AMIndeed, every abstraction has a computation cost and makes the code simpler (in theory, at least.) Since most performance problems stem from complex code, in most cases it makes sense to use abstractions.
Michał Tatarynowicz on March 19, 2010 6:16 AMGood call. I'd consider this in the same ballpark as the .NET vs. C argument. Of course C is technically faster at doing just about anything, but in real life, .NET is "good enough" that it's worth sacrificing a few milliseconds in favor of faster, more reliable, and easier development.
Joe Enos on March 19, 2010 6:18 AMGreat write-up.
At a recent users group in Atlanta, David Scruggs (http://blogs.msdn.com/dscruggs/) showed the benefits of using parallel LINQ queries in .NET 4.0 to gain huge performance gains over regular LINQ. I don't know if those benefits would carry over to LINQ-to-SQL, but it was quite impressive to see that the process of marshalling a foreach loop to different threads and bringing it all back together was being done for you and providing increased performance. Sure, compiling your own standard queries might be better than uncompiled LINQ queries in a lot of cases, but when you add in parallel processing that's handled for you, is there even a comparison?
BenMc on March 19, 2010 6:22 AMI always figured that LINQ to SQL was slower, but came to the conclusion when you have 10-15 queries on a page, what is an extra 100 ms? I have had mixed success with LINQ to SQL and sub queries from two different contexts, but that's neither here nor there.
As Joe Enos said, LINQ to SQL is "good enough"
Jeremyboyd on March 19, 2010 6:28 AMJeff... you're back! I hate to say it but I can't think of a really good coding article you've written in a while -- however, this one is right back at your peak. Articles like these are why I first started following your blog. Thanks for the post. I look forward to more like this.
Very well put. You addressed my concerns with LINQ and came to conclusions that make a lot of sense.
Robert McRackan on March 19, 2010 6:38 AMVery well reasoned. I see lots of developers obsessing over pointless optimizations. I think it is a sign of experience when a developer learns to trust the platform. An example I've seen recently is someone going to great lengths to avoid disk accesses to a relatively small file with a complicated caching algorithm, completely ignoring the fact that Windows had already cached the entire file for him. He no-op'd out the caching code and, surprise! Same performance ;-)
Timothy on March 19, 2010 6:44 AMDam it. I want my money back.
I thought this was a baiting post about PHP.
John Wards on March 19, 2010 6:51 AM@John Wards, funny thing is I believe most ASP.Net developers (of average intelligence that aren't language fanboys/gurlz) have come to the conclusion that PHP and .Net can be just as fast and as slow as each other. And going along with Jeff's "micro-optimization theater" post, shaving off 5 ms for an function call isn't going to do anyone any good, because over 1000 request, thats only 50 seconds.
And lets face it, everyone can spare a few minutes of there time every day ;)
Jeremyboyd on March 19, 2010 7:04 AMWasn't the whole reason you used Linq that regular old SQL wasn't "good enough" because it was slow?
Now Linq can be slow, but that's ok, because it's "good enough." :)
(Just had to give you a hard time)
Jeffrey Davis on March 19, 2010 7:07 AMCompiled queries are thread safe so you save them in static fields and reuse them across requests.
Anyway depending on percentage of CPU the queries take relative to the entire request, I'd say compiling queries might give you a very small percentage of improvement (posibly below 1% on average considering page caching).
Having non compiled queries in you wesite is like having a small piece of intepreted code that runs on every request(just consider you're running a small piece of PHP) you won't take a scalability hit because of that, its' ridiculous. If non compiled queries kill your scalability because they are parsed each run, what about sites written entirely in interpreted languages like PHP, Ruby, Perl, Python, aren't those scalable at all?
P.S. This is another missuse of the work "scalable", people don't seem to know what it means, the word here should have been more heavy weight, compiled or non compiled is not a scalability issue, both of them scale the same way, but one is ligher other is more heavy.
I work in an environment where security is important. We implement this using stored procedures to provide an API to the data - so no direct access to tables or views. This shifts some work into SQL but ultimately means that we have fairly simple code and minimal work when we decide to change the DB structure. It also means we don't really touch linq/sql - which I suspect might be a good thing.
Jonathan Evans on March 19, 2010 7:38 AM@ Michael Hamrah : "That's the Coding Horror I remember and miss!" Me too.. a lot of silence and lighter posts lately on this blog.
@ John Wards : "Dam it. I want my money back. I thought this was a baiting post about PHP." LOLLLL :)))))
@ Pop Catalin : "P.S. This is another missuse of the work "scalable"" Yes, you are so right about that. Performance and scalability, although closely in hand, don't mean the same thing. The way I see it scalability means the ability of a system to increase its throughput by "throwing" more machines at it.
Andrei Rinea on March 19, 2010 7:43 AMBravo! Great article. I will tell my team to get their stopwatches ready. I wonder how much performance is still hidden in our applications.
Holger Hinzberg on March 19, 2010 7:44 AM@Jonathan Evans,
You are probably aware of this, but I thought I would mention it in case you are not. LINQ has fairly decent support for stored procs. I have no idea what the specifics of your environment are, but you might be able to get some of the benefits of LINQ while still maintaining the security and performance you are seeing with your stored procs.
Jasonjackson on March 19, 2010 7:49 AMJeff is back!!!
Great post. Right on the money.
I read the MSDN article this month on compiled queries and I knew there was something that didn't quite taste right. You have spotted it and put some salt on it. Good job and thanks!
Wouldn't it be awesome if C# had macros and you could write a quick macro to do the conversion from the Linq to the ugly version of the code, and then use that macro when the performance optimization did matter.
Sadly, I'm not even entirely sure whether C# has macros of this nature or not, because I'm trapped in v2.0 of the .NET framework. Woe is me.
Joshua on March 19, 2010 8:26 AM@ Joshua : "Sadly, I'm not even entirely sure whether C# has macros of this nature or not, because I'm trapped in v2.0 of the .NET framework. Woe is me"
You can very well define an #if DEBUG ... #endif region
Also, there are certain source code processing tools out there...
Andrei Rinea on March 19, 2010 8:33 AMMaybe I'm just not awake yet, but how is this:
db.PostTags.Where(t => t.PostId == this.Id).ToList();
generating 500k queries? It's 1 query returning 500k results...
Robin on March 19, 2010 9:00 AM@Andrei Rinea - C# preprocessor directives are a long way off from real macros like lisps have:
http://www.apl.jhu.edu/~hall/Lisp-Notes/Macros.html
And sure you could use some sort of preprocessor, or aop or whatever, but any of those solutions would be a hack compared to having language support for macros.
Robin on March 19, 2010 9:03 AMYou are putting the compiled query syntax in a bad light.. It would be a better example if you take advantage of type inference:
var q = CompiledQuery.Compile(
(Northwinds nw, int orderid) =>
from o in nw.Orders
where o.OrderId == orderid
select o);
Or if it's a class field declaration:
Func, int> q = CompiledQuery.Compile(
(nw, orderid) =>
from o in nw.Orders
where o.OrderId == orderid
select o);
Sébastien Lorion on March 19, 2010 9:17 AM
Hum, I guess I should have previewed my comment ...
Func<Northwinds, IQueryable<Orders>, int> q = CompiledQuery.Compile(
(nw, orderid) =>
from o in nw.Orders
where o.OrderId == orderid
select o);
Sébastien Lorion on March 19, 2010 9:19 AM
I agree that Compiled Queries aren't that bad and their benefits out way any sort of negative feelings of their aesthetics. I usually write the query first, make sure it works in LinqPad or in the app then convert to Compiled Query when I'm ready.
As a note to gain the benefits of compiling your query you really need to make it static or you are actually making things worse since the compilation adds overhead and it would happen on every request.
Here is a good post on this subject
Greg Roberts on March 19, 2010 10:02 AM@JeremyBoyd "shaving off 5 ms for an function call isn't going to do anyone any good, because over 1000 request, thats only 50 seconds."
Actually that's only 5 seconds.
Turnkey on March 19, 2010 10:54 AMI like Linq-to-SQL. I think it makes a lot of things a lot easier.
However, because I'm a speed freak I've adopted a slightly different strategy. Plainly, I like to use Linq-to-SQL for all of the one-record CRUD statements. I'm talking things like updating the user's name, selecting their email address, adding a new user, etc. These statements are all very simple even on the SQL side of things, but writing them in Linq makes them even easier. And in fact these type of statements typically account for 90% of the queries from my sites.
The other 10% of queries are more involved with complex joins, sometimes utilizing full-text searching and they act on a much larger set of data. So with those queries I go all out. I'll write an efficient stored proc so that SQL server can cache those query plans aggressively and get the last bit of performance out of them. And then I'll call the stored proc from ADO.NET with the most performant option, the SqlDataReader. It's a little more code to do all of this. But to me it's worth it.
Steve Wortham on March 19, 2010 10:57 AMNice one Jeff. Sorry to lead you astray.
If you notice near the middle of my article I did mention that the reason this even was an issue was because I was doing orders of magnitude more queries than I should have been. I just found the whole situation quite interesting, in that the query compile overhead took more time than the SQL database query (the DB was running locally with the web server). For that low usage application it wasn't worth the time to do the macro optimization and the micro was fine. Sometimes crafting a better algorithm isn't worth the time. :)
I now use Linq to SQL very heavily in online games with millions of users, partitioned databases, and tens of thousands of simultaneous requests. Very few queries are compiled due to well thought out algorithms. Now our biggest CPU bottleneck is Linq to SQL's dynamic update system that uses reflection, so we use SPROCs for critical areas. But, all-in-all, the ROI in Linq to SQL is great.
JD Conley on March 19, 2010 11:28 AMOracle and Sybase ASE do cache SQL statements and execution plans that differ in literal values, even if you don't parameterize them. SQL server doesn't. This isn't Linq-to-SQL's fault, but rather SQL server's fault.
Shachart on March 19, 2010 11:31 AMNice article Jeff and something to get our teeth into. I have a data aggregator and roll-up application that's been behaving badly. It uses Linq to SQL heavily (the queries themselves aren't hellishly complex, they just operate on single tables and there's no joins) and I've been meaning to get time to profile it (code and SQL) properly. This article has given me the boot up the backside to go do something about it.
More of this kind of thing please! :)
Kev on March 19, 2010 12:34 PMMinor nitpick:
"[T]housands of hits per hour" is one or just a few hits per second. That sounds like low traffic to me.
(Raise the number by a couple of orders of magnitude and the discussion makes much more sense. Let's assume that.)
Liw on March 19, 2010 12:42 PMJeff -
"Stack Overflow has dozens to hundreds of plain vanilla uncompiled Linq to SQL queries on every page."
Could you elaborate on this a bit? Curious to know (the db accesses required to construct a page) what's requiring double to triple-digit numbers of queries on a page?
Thanks
Matthew Krieger on March 19, 2010 4:59 PMI appreciate your point that going back and re tooling now is a waste of time. I agree. But um, what's so hard SQL in the first place?
Heath Noble on March 19, 2010 7:36 PMI really appreciate your point on the queries.
Nishit Sinha on March 20, 2010 3:08 AMWelcome to the debate in using Python versus C#. Yes, there are other things people argue about the two languages (like static typing and so on).
In my experience, what anyone who argues in favor of C# will boil down to is "C# is faster", and yes, it is. But when the whole stack is benchmarked, you'll notice that a C# application versus a Python application gives response times in that same order of magnitude: a third of a millisecond or so.
toxik on March 20, 2010 5:50 AM@Shachart - Not true. SQL Server 2000 and later will perform auto-parameterization of queries in an attempt to reuse execution plans. (That being said, you really should be using explicit parameters wherever possible.)
See here: http://msdn.microsoft.com/en-us/library/aa175264(SQL.80).aspx
and here: http://msdn.microsoft.com/en-us/library/cc966425.aspx
RobW on March 20, 2010 6:45 AM> All subsequent runs of that same query use the cached query plan and skip the query evaluation. Not so in Linq to SQL. As Rico said, every single run of the Linq query is fully parsed every time it happens.
No, what Rico said has nothing to do with cached query plan. He said that in every single run, the Linq query expression has to be parsed and converted to SQL. This is all happening in the app, whereas the query plan resides in the db.
Like Shachart said, whether or not a cached query plan will be used is database dependant.
Also, when you use ORM, you always do "5000 batches of 1 selects". No sane person will use ORM to do "1 batches of 5000 selects". That's why Rico concluded his blog post with this statement:
"In short, if you expect to reuse the query at all, there is no performance related reason not to compile it."
sayap on March 20, 2010 8:05 AMIt is totally ununderstandable. Is this about converting a linq query into a sql query in the client/middle tier? Or is it about a RAM hash table in the database server with the execution plans of already executed queries?
So can anyone address MehrdadA's remarks and questions?
The hash table with the execution plans is in Oracle a shared resource (all db connection use the same hash table), and because it is shared it is protected with spin locks. When there are many connections those spin locks will become a bottleneck because too many connections want to mutate the hash table and have to wait for a lock to continue.
This means that a proper performance benchmark has to be multithreaded benchmark if you want to load test an OLTP application! Has Jeff done a single threaded test or a multi threaded test?
So it is better to use parameterized queries in Oracle. Caching sql statements that differ in literals is also possible but you have to set a certain parameter. After setting this parameter, the oracle server will replace all literals with parameters and cache the execution plans.
However using parameterized queries is the preffered way in an Oracle based OLTP application.
Does Linq2Sql produce queries with parameters or litarals? Do you have to compile those queries if you want parameterized queries?
It is also very important to use parameterized queries if you want a good performance when you use Sqlite. Sqlite will become two or three times slower if you don't use parameterized queries, you can't call that a micro optimization. Sqlite doesn't have a setting to chance this
behaviour.
Surely the big lesson here is that instead of blindly following the advice of Rico Mariani (or Jeff Atwood) we should _test_ to see whether compilation makes a meaningful difference _in_our_particular_application_.
Jeff is conducting an implicit test, by leaving his queries uncompiled and seeing how that turns out. It's turned out fine in his case. I conducted the same implicit test on a recent project and it wasn't fine. Performance was particularly bad when using a very broad inheritance hierarchy of L2S entities on the 32-bit version of the CLR.
@Jeff: By the way, your comparison may not be entirely valid. You wrote: "Then we compared it with the SQL variant; note that this is also being auto-cast down to the handy PostTag object as well, so the only difference is whether or not the query itself is SQL." The point to note is that materializing the objects (i..e. constructing your PostTag objects) is something that is greatly improved by compilation. In other words, L2S compilation does not just affect running the SQL, it also affects instantiating the result set objects. (Rico covered this in one of his posts IIRC) I believe a more realistic test is to run two versions of a linq query - one compiled and one not. (Leave ExecuteQuery out of the picture totally). When I run such a test, the compiled version takes about 25% of the time of the uncompiled version. That's for simple queries. When hitting a table that contains a very broad inheritance heirarchy, under the 32-bit version of the CLR, the performance differences is orders of magnitude.
John Rusk on March 20, 2010 10:32 PMFirst of all, Rico's article is 3 years old. The one I read anyway.
It's not true that all LINQ to SQL queries are evaluated by the database. All LINQ queries generate parameterized SQL queries meaning the execution plan is evaluated ONCE and cached on the server.
I don't know where Rico got his information from.
Vince Panuccio on March 21, 2010 2:14 AM@John Rusk: Surely the big lesson here is that instead of blindly following the advice of Rico Mariani (or Jeff Atwood) we should _test_ to see whether compilation makes a meaningful difference _in_our_particular_application_.
No. We should write our apps in the way that is most easily maintained, and not even worry about whether compilation would make a meaningful difference unless and until we determine that we need to speed things up, and then only after we've established by profiling that the compile time is what is slowing us down.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"
- Donald Knuth
@jdege
That's the implicit test I mentioned: build it, and see if you have a problem. I did not mean to imply that was a bad thing to do. Generally, it's my preferred approach too.
My concern with this thread is that _in_some_circumstances_ LINQ to SQL compilation is not one of the "small efficiencies" which Knuth criticizes, it can be a make or break issue for an application. This thread seems to inappropriately trivialize that possibility.
I'm not advocating doing something "just because its faster", at the expense of maintainability; but rather being aware of problems which may arise, so we can make sure to test and profile the application thoroughly enough, and early enough.
@Vince Panuccio
Jeff didn't fully understand Rico's post, and then misguided you with his incorrect interpretation. Rico's post is about LINQ to SQL compilation, which happens at the client/app side. After that, whether a cached query plan will be used, is database/driver/config dependent.
sayap on March 21, 2010 8:14 PM@ RaymondC : "Does Linq2Sql produce queries with parameters or litarals? Do you have to compile those queries if you want parameterized queries?"
LINQ to SQL produces parameterized queries, not literals.
You do not have to compile those queries to have parameterized queries. Parameterized queries are the only output option.
Andrei Rinea on March 22, 2010 2:24 AMLike Saint Jerry Pournelle used to say "better is the enemy of good enough". (I miss Byte, I really do).
I work with Rails and HAML is supposed to be faster for dynamic content. But, it doesn't look like HTML and designers freak out (sometimes). I lost work one time because the guy who was paying me didn't understand it.
So, you pick HAML because it meets some other need. Raw speed is irrelevant, particularly if you're cacheing (and you are, aren't you?).
Like you say, Jeff - micro optimisation is sub optimal.
Francis Fish on March 22, 2010 3:37 AMAwesome post. I just started using LINQ to SQL at my new job and was curious how it stacked up performance-wise. Thanks for the info.
Abe Miessler on March 22, 2010 7:18 AM@John Rusk: "My concern with this thread is that _in_some_circumstances_ LINQ to SQL compilation is not one of the "small efficiencies" which Knuth criticizes, it can be a make or break issue for an application."
My concern is that folks who'd read this thread, when dealing with an application that clearly has inefficiencies that are a make or break issue, will jump immediately to the conclusion that since they'd read about LINQ to SQL compilation being a problem, it must be their problem.
It might be, or it might not be. There are plenty of other possibilities. It's important that you identify that this is your problem, before you waste effort on trying to fix it.
The article in which Knuth made his "premature optimization" remark is available online: Structured Programming with go to Statements. The succeeding paragraph is instructive:
Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail.jdege on March 22, 2010 8:22 AM
Microsoft has ended support for Linq To SQL and moved the team to Entities Framework. And most people prefer to us Fluent Nhibernate anyway because it is compiled.
Ric Johnson on March 22, 2010 12:58 PMI have analyzed this a lot and I think YOU ARE WRONG!!!!!
(I apologize if I am way off on this, but...)
Think about it. Provided that you are caching the compiled queries at an Application level, you are NEVER running the query JUST ONCE!!! If you store it at an Application level, it is being run for every single user, every time they hit that page!!!
You DON'T create a new compiled query every page load and use it once. You compile it at application start up or when it is first used, and then used that compiled query in EVERY SINGLE CALL TO THE DAL!
Am I right?
Skippyfire on March 22, 2010 1:45 PMNo, you're not exactly right Skippyfire... when you do the straight up LINQ thing, it gets JIT compiled... so it happens every time. That's kinda the whole point about the article - that the JIT method is not so good.
@Jasmine: Your statement is somewhat confusing, so let me be as clear as I can.
Compiled queries are ALWAYS faster. In an ASP.NET Application, when do you use a query JUST ONCE? Never! That query is used by thousands of different requests!
So the usage scenario of using a query "just once" never occurs. I hope I am explaining myself better.
As for this whole thing being a micro optimization, that might be true... but I still think the "performance tests" from Jeff and Rico are highly misleading.
Skippyfire on March 23, 2010 6:45 AMOh yeah, you just need to make sure that the compiled query is being "cached". So you need to store it in an application-wide location so it doesn't get compiled every time.
Skippyfire on March 23, 2010 6:49 AMWell that's what the article is saying you should do, compile the query and reuse it... but LINQ doesn't do this for you, at application compile time. During the application runtime, LINQ does a JIT compile on every query, every time, unless you say otherwise. It seems to be a minor performance hit compared to the database shot anyway, but it's interesting to know.
What worries me about LINQ in general is that it seems to promote lazy database design and usage, and this can hurt performance as well as stability.
Jasmine Adamson on March 23, 2010 8:17 AM@Jasmine: I don't think that LINQ or EF promote lazy database design at all! In fact, having a poorly designed database will probably cause more problems with these ORMs than if you were using straight SQL. Either way, your database should use primary keys, foreign keys, indexes, etc.
The benfits of the ORMs is that you don't have to write the rudimentary database access code, and get to use LINQ queries right away.
Could you elaborate on how they promote lazy design?
Skippyfire on March 23, 2010 8:41 AM+1 anyone who commented that LINQ to SQL creates parameterized SQL queries that are re-used efficiently at the database.
Compiled queries aren't very flexible because they don't support dynamic composition, which is arguably one of the best features of LINQ (and LINQ to SQL as well). The shape of a compiled query is static, and you can only perform parameter substitution when it is executed - you can't change its shape at runtime.
This is fine for your basic SingleOrDefault() or Where() queries, but
the vast majority of queries I need to write require some form of dynamic composition (add a filter when these parameters are specified, change the projection when this parameter is specified, etc).
While you could attempt to factor a dynamically composed query into multiple static compiled queries, I return to the maxim, if it ain't broke, don't fix it!
Compiling queries just for the sake of it is premature optimization at its worst, and shouldn't be done without exacting proof from a profiler, and an incumbent performance problem.
Phaedrus30 on March 23, 2010 5:43 PM@Skippyfire
You are quite correct. A compiled query should be cached at the application level and be reused between requests.
Jeff's statement "But then I realized it's probably more realistic to make sure that any page we generate isn't doing 500 freakin' thousand queries!" can be misleading. I expect what Jeff means is that with few queries per page, and a small overhead per non-compiled query, the extra response time for that page won't add up to much. Which makes sense.
But it is easy to misread what Jeff said to mean that he thinks compiled queries are only cached for the duration of a request, such that if you don't have many queries per request, the compiled ones won't be reused much and compilation might actually hurt performance. But this, of course, would be unrealistic, since the compiled queries would be cached at application level, and thus if you have multiple requests to the same page, calling on the same query, the compiled one will be used (except the first time the page is called) and performance will improve.
With 1.5 million hits per day on Stack Overflow, a compiled query cached at application level should find tremendous reuse and possibly shave a reasonable chunk of load from the server.
At the same time, each page might still load only marginally quicker, which is what Jeff observes. In other words, it is the type of optimization that may do more (in an already well balanced system) towards lowering the electricity bill for the server owner rather than make significant impact on response times for the end users.
Mats Helander on March 26, 2010 6:53 AMLOL. Looks like you did a sufficient job of inspiring one hell of a micro-optimization theatre in the comments with this thread.
I completely agree that LinqToSQL is very convenient and useful tool. Since I learned it, I have had only one instance where I'd use traditional SPROCS over it and that required an extensive read/process/write step to occur in no more than 100ms (obviously not web stuff).
If there is no better example, the comment thread of this post poses a perfect case where the word scalability has become the next gen ::cringe::paradigm::/cringe:: "web 2.0" of software development.
Meaning that, any jackass who can write a loop and store a date/time value will be partaking in creating comprehensive statistical explanations of why solution x is faster than solution y. Welcome back to bike shed painting 101. Disclaimer: I can't claim complete innocence, I have painted that bikeshed my fair share of times too.
I think, the real concept of scalability represents 2 things.
1. Raw performance
I if the number of executions is increased at a linear rate is the processing time growth linear, exponential, or lograrithmic. I'd consider logarithmic growth = scalable and optimized, linear growth = scalable, and exponential growth = not scalable.
Ex. If your site grows 10x in popularity is it going to need 5x, 10x, or 100x the servers to keep up with the demand?
2. Application Domain
You put the difference into perspective perfectly. "Let's do the math, even though I suck at it. Each uncompiled run of the query took less than one third of a millisecond longer." Once again, you prove that you're smarter than the average code monkey and that's why I like your blog. I can actually feel myself not grow dumber the longer I read your material. :P
Unfortunately, scaling has hit the "scene" and all the code monkeys have their panties in a bunch. There's masses of really bad/incorrect examples going around about optimization. It's gonna take a colossal amount of panty un-twisting to fix it.
For websites specifically. I'd specify the domain range as 7 seconds. You have 7 seconds to load everything before the tip of the ADHD afflicted masses start to flee in troves, followed by average people (15 sec), and finally the brutally patient (whatever the timout rate is). If your page can't load in less than 7 seconds you're doing it wrong. There are much better optimizations that can occur in this scope (eliminating file requests from the server, or from multiple servers::cringe::) than trimming off a few ms from a DB query.
My .02 on perf.
@ Ric Johnson
Although most people don't know. Linq is about more than SQL ORM. Linq can be performed on XML, Collections, etc... For anybody that has used Linq extensively, it's pretty obvious that the functionality doesn't do a good job of supporting changes in the database model.
I think they're also leading away from LinqToSql because it only supports MsSql. What about all of the other modeling systems out there? LinqToOracle, LinqToMySql, LinqToPostgre, etc... The name and application itself isn't general enough to cover all the systems that people will eventually expect of it. Neither Linq or LinqToSql are going anywhere. Linq will be around forever, and LinqToSql will sit quietly in the .Net framework and do what it does best. I think their development emphasis will just be directed more toward Entities for the ORM part and LinqToEntities for the querying part.
Question for the masses: in the last decade there has been so much talk about the N-tier approach and separating presentation, business, data access and data. So, how does using LinqToSql fit into this?
I love the idea of using the SQL db just for storage. So, without using LinqToSql as "inline SQL" (meaning in the presentation layer), are you suggesting we keep all LinkToSQL in the data access layer?
PS. great post!
VdomainsHosting on March 30, 2010 7:31 AMThis was an incredibly exciting understand. Hostgator I will bookmark you website to verify on it after.
The whole point here is the LINQ to SQL compilation time is trivial. Its only a few extra ms per call, and that is happening on the web server, not the database server.
So, spending time compiling those queries will only give a slight boost in performance and it is not worth doing if your only gaining a few extra ms.
I'd be more worried about the SQL that the LINQ compiler is generating. After all, the database runs SQL statements not LINQ ones. A little tougher to debug since the actual SQL is abstrated out by LINQ. But you can still trace things out and see what is going on in SQL land.
Anyway, you want to avoid this, write a stored proc, open a connection and run the proc with a command. Process the results with a data reader. Then there is 0 compilation time with LINQ.
astro Your blog is a true mine of information, I am an avid reader and I wish you good continuation
lora on August 10, 2010 4:41 AMMediums I stumbled upon your blog very interesting! ! A quick hello from a person who has always appreciated
lora on August 10, 2010 4:42 AMtarot it’s good to see this information in your post, i was looking the same but there was not any proper resource.
lora on August 10, 2010 4:43 AMmedium Some talk too much without saying anything! at least here I find what looks so clear and precise...
lora on August 10, 2010 4:44 AMThis is only a preview. Your comment has not yet been posted.
As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.
Having trouble reading this image? View an alternate.
| Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |
Posted by: |