August 24, 2008
You may have noticed that my posting frequency has declined over the last three weeks. That's because I've been busy building that Stack Overflow thing we talked about.
It's going well so far. Joel Spolsky also seems to think it's going well, but he's one of the founders so he's clearly biased. For what it's worth, Robert Scoble was enthused about Stack Overflow, though it did not make him cry. Still, I was humbled by the way Robert picked this up so enthusiastically through the community. I hadn't contacted him in any way; I myself only found out about his reaction third hand.
That's not to say everything has been copacetic. One major surprise in the development of Stack Overflow was this recurring and unpredictable gem:
Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Deadlocks are a classic computer science problem, often taught to computer science students as the Dining Philosophers puzzle.
Five philosophers sit around a circular table. In front of each philosopher is a large plate of rice. The philosophers alternate their time between eating and thinking. There is one chopstick between each philosopher, to their immediate right and left. In order to eat, a given philosopher needs to use both chopsticks. How can you ensure all the philosophers can eat reliably without starving to death?
Point being, you have two processes that both need access to scarce resources that the other controls, so some sort of locking is in order. Do it wrong, and you have a deadlock -- everyone starves to death. There are lots of scarce resources in a PC or server, but this deadlock is coming from our database, SQL Server 2005.
You can attach the profiler to catch the deadlock event and see the actual commands that are deadlocking. I did that, and found there was always one particular SQL command involved:
SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3
WHERE [Id] = @p0
If it detects a deadlock, SQL Server forces one of the deadlocking commands to lose -- specifically the one that uses the least resources. The statement on the losing side varied, but in our case the losing deadlock statement was always a really innocuous database read, like so:
WHERE [ParentId] = @p0
(Disclaimer: above SQL is simplified for the purpose of this post). This deadlock perplexed me, on a couple levels.
- How can a read be blocked by a write? What possible contention could there be from merely reading the data? It's as if one of the dining philosophers happened to glance over at another philosoper's plate, and the other philosopher, seeing this, screamed "meal viewing deadlock!" and quickly covered his plate with his hands. Yes, it's ridiculous. I don't want to eat your food -- I just want to look at it.
- We aren't doing that many writes. Like most web apps, we're insanely read-heavy. The particular SQL statement you see above only occurs when someone answers a question. As much as I want to believe Stack Overflow will be this massive, rip-roaring success, there just cannot be that many answers flowing through the system in beta. We went through our code with a fine tooth comb, and yep, we're barely writing anywhere except when users ask a question, edit something, or answer a question.
- What about retries? I find it hard to believe that little write would take so incredibly long that a read would have to wait more than a few milliseconds at most.
If you aren't eating -- modifying data -- then how can trivial super-fast reads be blocked on rare writes? We've had good results with SQL Server so far, but I found this behavior terribly disappointing. Although these deadlocks were somewhat rare, they still occurred a few times a day, and I'm deeply uncomfortable with errors I don't fully understand. This is the kind of stuff that quite literally keeps me up at night.
I'll freely admit this could be due to some peculiarities in our code (translated: we suck), and reading through some sample SQL traces of subtle deadlock conditions, it's certainly possible. We racked our brains and our code, and couldn't come up with any obvious boneheaded mistakes. While our database is somewhat denormalized, all of our write conditions are relatively rare and hand-optimized to be small and fast. In all honesty, our app is just not all that complex. It ain't rocket surgery.
If you ever have to troubleshoot database deadlocks, you'll inevitably discover the
NOLOCK statement. It works like this:
FROM [Posts] with (nolock)
WHERE [ParentId] = @p0
It isn't just a SQL Server command -- it also applies to Oracle and MySQL. This sets the transaction isolation level to
read uncommitted, also known as "dirty reads". It tells the query to use the lowest possible levels of locking.
But is nolock dangerous? Could you end up reading invalid data with read uncommitted on? Yes, in theory. You'll find no shortage of database architecture astronauts who start dropping ACID science on you and all but pull the building fire alarm when you tell them you want to try nolock. It's true: the theory is scary. But here's what I think:
In theory there is no difference between theory and practice. In practice there is.
I would never recommend using nolock as a general "good for what ails you" snake oil fix for any database deadlocking problems you may have. You should try to diagnose the source of the problem first.
But in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems. I asked around, and I got advice from a number of people whose opinions and experience I greatly trust and they, to a (wo)man, all told me the same thing: they've never seen any adverse reaction when using nolock. As long as you know what you're doing. One related a story of working with a DBA who told him to add nolock to every query he wrote!
With nolock / read uncommitted / dirty reads, data may be out of date at the time you read it, but it's never wrong or garbled or corrupted in a way that will crash you. And honestly, most of the time, who cares? If your user profile page is a few seconds out of date, how could that possibly matter?
Adding nolock to every single one of our queries wasn't really an option. We added it to all the ones that seemed safe, but our use of LINQ to SQL made it difficult to apply the hint selectively.
I'm no DBA, but it seems to me the root of our problem is that the default SQL Server locking strategy is incredibly pessimistic out of the box:
The database philosophically expects there will be many data conflicts; with multiple sessions all trying to change the same data at the same time and corruption will result. To avoid this, Locks are put in place to guard data integrity ... there are a few instances though, when this pessimistic heavy lock design is more of a negative than a positive benefit, such as applications that have very heavy read activity with light writes.
Wow, very heavy read activity with light writes. What does that remind me of? Hmm. Oh yes, that damn website we're building. Fortunately, there is a mode in SQL Server 2005 designed for exactly this scenario: read committed snapshot:
Snapshots rely on an entirely new data change tracking method ... more than just a slight logical change, it requires the server to handle the data physically differently. Once this new data change tracking method is enabled, it creates a copy, or snapshot of every data change. By reading these snapshots rather than live data at times of contention, Shared Locks are no longer needed on reads, and overall database performance may increase.
I'm a little disappointed that SQL Server treats our silly little web app like it's a banking application. I think it's incredibly telling that a Google search for SQL Server deadlocks returns nearly twice the results of a query for MySql deadlocks. I'm guessing that MySQL, which grew up on web apps, is much less pessimistic out of the box than SQL Server.
I find that deadlocks are difficult to understand and even more difficult to troubleshoot. Fortunately, it's easy enough to fix by setting
read committed snapshot on the database for our particular workload. But I can't help thinking our particular database vendor just isn't as optimistic as they perhaps should be.
Posted by Jeff Atwood
Using parameterised SQL statements (as in the shown code) can cause SQL to not use indexes, thus causing table scans, which can cause table locks.
@Fran, this might have been true before parameter peeking was introduced in IIRC MSSQL7, or if you way out of date statistics.
My experience is that SQLServer is quite aggressive on using indexes for equals / like% queries (as you would expect) but it is unlikely to use indexes when using a range scan or if you are issuing SELECT * which means both the index and the table row records need to be read.
As always, don't ASSUME anything. If you want to be sure then design a test case and PROVE it. (I did this on a StackOverflow thread and surprised myself!)
I have seen something similar to this before. As another poster mentioned, if the select or the write requires a table scan then that will require a table lock and they can conflict. However, just adding an index may not help - and I would indeed assume that you already have that.
Using parameterised SQL statements (as in the shown code) can cause SQL to not use indexes, thus causing table scans, which can cause table locks.
So, if you do the thing about declaring a variable and then using that variable in a SQL statement, then the optimiser may choose not to use an index. I believe something similar is true for prepared statements - sometimes.
Because the database is small and all in memory, you are unlikely to even notice that table scans are happening.
You need to use the SQL Profiler with more details, combined with the Performance Monitor for checking for number of locks and number of table scans. Then try looking at the estimated execution plan in Query Analyser - using variables/prepared statements rather than substituting them.
Wasn't the diner a drive system for some ship in one of the Hitchhiker's Guide to the Galaxy books?
@Chris Not sure if you're coming back to read this, but anyway, I just wanted you give you a thumbs up for your comments. I'm glad to see that someone has a different perspective on things. You also inspired me to do a little rant on stupid questions (hope you don't mind me quoting you):
(Sorry about hijacking your post, Jeff, but he didn't leave any contact information.)
One has to be careful while using the (nolock) hint in SQL Server due to the fact that it is possible for the query to not return all the expected results!
I have spent many days trying to figure out why a SSRS report never returned consistent results nor match information out of Microsoft Dynamics AX (an ERP system with SQL Server as a back end). This was due entirely to the (nolock) hint causing SQL Server to not return records that had a lock on them (hence the same query wouldn't always return the same rows). You need to SET ISOLATION LEVEL to READ UNCOMMITTED, or use the SNAPSHOT isolation level you mentioned in your post.
One thing of note with READ COMMITTED SNAPSHOT, it uses tempdb to create the snapshot before taking out the lock, so your tempdb files should be on an appropriate disk setup as it is being used more often.
Questions: http://www.hidesato.com/contact/ (until I can get a contact type form setup on my personal website - http://siliconsea.net/)
The dining philosophers problem has been coined a classic synchronization problem either because it's practically inapplicable nor does computer scientists dislike scientists...
-I read it in an operating systems book
After all the whaling and slagging off SQL server the issue wasn't with SQL Server at all!
For fun go back through all the clueless comments and laugh at stupid attacks on SQL Server which have shown again to be unfounded.
Ignorance is truely bliss for most developers.
Jeff should write a follow up post, about being careful not to jump to conclusions when you don't understand the problem/software/library.
We figured out our deadlock problem, finally. It had to do with Log4Net being called during a database operation, itself triggering another database operation and log event. This is normally OK, but the events were happening so close together that it resulted in a deadlock.
SQL Server still works fine. Programmers still blame their platform for their mistakes.
Jeff said - We figured out our deadlock problem, finally. It had to do with Log4Net being called during a database operation, itself triggering another database operation and log event. This is normally OK, but the events were happening so close together that it resulted in a deadlock.
@JacKU - being careful not to jump to conclusions when you don't understand the problem/software/library.
I agree. You can easily come to an incorrect conclusion even if you follow a very logical path. I find the forums really useful. How many problems/errors/issues do you come across where nobody in the world has ever come across that issue before in any form? Not very often. Being part of on-line forum communities is becoming essential because woops problems like this just shouldn't happen if propper communication through forums is occuring.
We can't know everything, but everyone can know something.
The article as well as the discussion just confirms what I encountered in my professional experience - many experienced application developers don't have a good understanding of the most basic DBMS properties: ACID, concurrency control, serialization levels... All of this I learnt at uni, can't remember if it was second or third year.
The underlying logical and theoretical essentials should take no more than a dozen hours to comprehend. RDBMS vendor-specific features are a few hours on top. What's everyone's excuse for not investing such relatively insignificant amounts of time?
Jeff's having problems with a simple web app. The scary part is that there are much more demanding apps designed by similarly ignorant engineers.
PS: A simple clue for raising DBMS-ignoramus alerts - look out for the word persistence (e.g. our persistence layer).
One of the best ways to motivate programmers to work is to have a clean and beautiful codebase.
it's a great investment. An opensource product with less features but beautiful code is preferable for a team to customize, I think.
Most DBAs don't know programming so don't answer these questions correctly (which is why I don't hire DBAs, becuase 95% are worthless). A few facts I'm surprised no on mentioned:
1. In the same database connection, you CANNOT block yourself. I've see lots of people chase their tails.
2. Many deveolopers use multiple database connections. This allows them to block themselves in the same logical set of code. If they simply reused connections, blocking can go away.
3. Try to always read and update your tables in the same order. This is old school, but just about no developer or DBA enforces these rules anymore.
4. Using NOLOCK in ALL instances is just plain wrong. Use it when you know it doesn't matter...like all things, you've gotta know what you are doing to do it well.
I encountered the same problem last week. Whenever two threads ran the transaction simultaneously, there occurred a deadlock. After reproducing the error and finding no solution, I placed a lock in the c# code so that no two thread execute that transaction at the same time. This eliminated the problem but at the cost of reduced parallelism. The idea of retrying could have backfired as there could have been an issue of thread starvation. Any comments or a better idea?
There isn't any no locks option in Oracle because readers don't block writers. (and Oracle does not support dirty reads because there is no reason to do so) Not all RDBMSs work the same way. This is a prime example of that.
Just because dirty reads don't exist doesn't mean that readers don't block writers. Dirty reads mean that readers aren't blocked by writers. Readers DO NOT block writers in SQL Server.
The one exception in SQL Server where readers block writers is when the reading is done within a transaction. That is - if a row is read in a transaction it is considered part of the transaction, but that is logical.
You wouldn't want to start a withdawal proces, read the balance, proceed with the transaction only to find that the balance was updated during the final part of the transaction! Instead, the second process trying to reduce the balance shoul wait until this transaction finishes.
I'm new to PHP and recently setup my local machine with PHP and MySQL for doing development. I was sort of stuck when I needed to post my work for the user to test and review. After looking around a bit I found a site that hosts PHP and MySQL apps. I was surprised that it was free - it seems they're offering the service at no cost until 2012. At that point they'll change over to a fee-based service. However, in the meantime, it's a great place to do anything from demo and sandbox right up to posting sites for real.
Their pitch is as follows:
This is absolutely free, there is no catch. You get 350 MB of disk space and 100 GB bandwidth. They also have cPanel control panel which is amazing and easy to use website builder. Moreover, there is not any kind of advertising on your pages.
Check it out using this link:
Important: There's one catch in that you must make sure you visit the account every 14 days - otherwise the account is marked 'Inactive' and the files are deleted!!!
Thanks and good luck!
We had a similar problem where deadlocks were happening, and we found that it all came down to the MaxDop (Maximum Degree Of Parallelism) setting.
We have 90GB databases running super fast with no drop in performance by setting this value to 1. In fact we recommend it now. We also get 0 dead locks as a result.
I would definately try this.
I like your picture,it's funny.
On one site with high traffic we are managing, we discovered an interesting issue. We had a bug with the view counter of the post.
Initially we were reading the table for the "post" which had viewcount. Then in the application we were doing a viewcount = viewcount+1. Finally we were updating the DB with the new viewcount.
Issue: massive dead locks.
We looked at the stats of the DB and discovered that many UPDATE were trying to insert the same value: viewcount. The app does many reads very quickly and pile up the subsequent writes with the *same value*.
And that doesn't please Postgresql at all.
We removed all our issues by incrementing the viewcount directly through an sql statement.
SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3
WHERE [Id] = @p0
Have you tried this:
SET [AnswerCount] = [AnswerCount+1], [LastActivityDate] = @p2, [LastActivityUserId] = @p3
WHERE [Id] = @p0
I'd quess you used to do this:
select AnswerCount from .... ;
...increment AnswerCount in application logic
update ... set AnswerCount=NewAnswerCount;
In which the problem is the default isolation level in postgres allows the first select to read data without any locks (well it'd just get a mvc snapshot when it starts - what Jeff wants to be the default in SQA Server :-), resulting in concurrent executions of that transaction block will read data before other transaction have completed.
One fix is to run that transaction with the SERIALIZABLE isolation level (in which case postgresql will abort the transaction and make you restart it)
Or in this case you could simply
select AnswerCount from .... for update;
which will do the proper locking, so concurrent execution of that transaction will see a coherent snapshot of the data, and will wait for other transactions to complete.
I like your pictures very much.
OK, but why doesn't SQL Server re-issue the transaction on my behalf, then? Automatic retries of some kind?
Jeff, would you expect any other API to retry a command if it failed? It would seem like this is a cleanly written API. I think of SQL as an API to SQL Server (or any other RDBMS). It should be up to the client to handle the error.
SQL Server isn't acting up here, it's just doing what LINQ To SQL told it to do. It sounds like READ COMMITTED SNAPSHOT will solve the deadlock issue, but it also sounds like it's a pretty inefficient workaround because it's making copies of all data as it's written.
I'd recommend looking at the LINQ To SQL usage. As I understand it, you're using a static DataContext for the controller. From what I've read (and in my experience), it's better to create a new DataContext for each data operation. That may sound inefficient, but it sounds a lot more efficient to me than using a database setting which alters the way data is physically stored to create multiple snapshots of the data as it's written.
Too bad that you don't seem to notice why a SELECT starts a transaction. A transacted READ makes sure you don't have dirty reads. Your example about out-of-date data is one which isn't that important as this is the same as caching.
What IF you have a rule in a table that is like an order column that containt values for a set that are unique. For example a table that contains multiple menulists but each listitem is unique per list. Doing a dirty read on that table *could* cause multiple items having the same order value as an update *could* have updated the data during reading.
This of course depends very much on the type of data you have, how (de)normalized the data is and how you manage transactions with methods like optimistic concurrency control.
The same when you READ and WRITE. For example sum all order lines, add VAT and store it at the order for quick read. You do want to make sure that no order lines are altered during this read.
When you know that certain data will never be updated like logging data then you can do dirty reads as those will not cause harm.
When you get deadlocks then you are not using the appropriate locking and/or isolation levels which definitely is an application design error. Just don't say to just perform dirty reads as that really me loose faith in your blog.
Still.. databases provide integrity and features to circumvent/loosen that model if you know what you are doing will not break integrity. If you don't know what you are doing then don't play around with the transaction isolation levels. Just increase performance by doing less reads by introducing caching and add versioning to records to perform optimistic concurrency control without sacrificing integrity.
Actually, it's quite easy to deadlock on Oracle, it happens quite frequently the second you open your wallet to pay for it and realize the price doesn't include the guru you need to pay as well just to set it up decently.
Being real, deadlocks don't have anything to do with reads. They have to do with the acquisition order of locks and they can happen in Oracle just as in any other db.
If two transactions acquire locks to shared resources in different orders then it becomes possible each will grab a lock on a resource the other needs to finish and both processes become deadlocked. That Oracle doesn't lock reads simply reduces the likely hood of this happening. That it does MVCC by default also means the write locks are extremely short, just long enough to change the pointer from the old version to the new version rather than locking during the write itself.
To commit a write they still have to ultimately acquire a write lock on the current row long enough to swap in the new version and hold it until all other locks needed for this transaction to commit are successfully acquired. Multi version concurrency control allows multiple writers as well but ultimately only one will acquire the lock and succeed with the update and the other will be aborted. If those two transactions acquire their locks in different orders and each succeeds on locking a resource the other needs to finish, you have a deadlock.
Deadlocks are a logic problem, not a software problem. If you're getting deadlocks it's because your code is flawed and your transactions are not acquiring locks in the same order, i.e. first table A, then table B, then table C. What the order is doesn't matter as long as they all do it the same.
MVCC just happens to shorten lock time so much both by eliminating read locks and by allowing multiple writers and extremely short write locks that many such flaws in logic are never exposed because the likely hood of them happening is so rare. Sql locks during the write to disk by default, so the lock time is long, Oracle writes to disk first, all the new versions, then acquires locks just long enough to swap in the new versions.
Oracle chooses the correct approach and just suffers the performance penalty for always doing row locks and MVCC in order to achieve the correct behavior.
Sql 05 or better can be made to behave correctly, it just isn't the default behavior. MVCC is expensive so it's off by default because they don't want 05 to appear slower than previous versions. That was a mistake, it should be on by default, correct behavior is more important than any appearance of speed.
In Oracle and PostgreSQL, a read will never block, since they both use a form of multiversion concurrency control. This feature removes a major cause of database programming suckage.
Assuming this is not an April Fool's type joke blog entry: the author states I'm no DBA and then proceeds to give DB related advice? Seriously? Even to the point of saying But in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems. Never *SEEMS* to lead to problems? Really? These are to be taken as trustworthy statements? And the conclusion from the Google search on SQL Server vs MySQL deadlocks did not lead to a conclusion that twice as many (at least) companies use SQL Server vs MySQL? (and that doesn't even take into account the fact that MySQL has only supported transactions for a relatively short time in comparison to SQL Server). And no mention of the increased TempDB activity trade-off when using snapshot isolation?
I am very glad that this guy doesn't work for my bank. And, developers like this (no matter how good they are at application code and/or architecture) really do help keep DB folks like myself very well paid. Bless his heart ;-).
PS. with regards to one of the responses above (from wds): this is not an insult to the author's intelligence but rather a response to the dismissive attitude that databases are so simple that anyone who is a good developer is automagically qualified to do data modeling, solve complex DB problems, and even give DB-specific advice.
Love the rant....just had the a similar problem. It caused me to lose some sleep and some hair.
The problem I had was a simple insert stored procedure - when I say simple it was basically "insert into tablename values ....". The deadlock was between, you guessed it, two processes running this stored procedure.
Do you know why? The table was set up with a primary id that was an autonumber field (ie. seeding, increment). On insert Ms Sql has to lock the previous record, meanwhile another update is taking place concurrently.....presto deadlock.
I thought it was my so called 'lack of sql' knowledge in writing stored procedures. But it was a lack of knowledge in database design - not that I am a DBA (I am a software developer).
There are a few work arounds, having another table is one of them. But I chose the easiest option - make the column type of 'uniqueidentifier'. Problem partially solved, now it is time to inform my so called SQL Admin Expert.
PS. This is Sql 2005, not sure if this is still a problem in 2008.