Deadlocked!

August 24, 2008

You may have noticed that my posting frequency has declined over the last three weeks. That's because I've been busy building that Stack Overflow thing we talked about.

It's going well so far. Joel Spolsky also seems to think it's going well, but he's one of the founders so he's clearly biased. For what it's worth, Robert Scoble was enthused about Stack Overflow, though it did not make him cry. Still, I was humbled by the way Robert picked this up so enthusiastically through the community. I hadn't contacted him in any way; I myself only found out about his reaction third hand.

That's not to say everything has been copacetic. One major surprise in the development of Stack Overflow was this recurring and unpredictable gem:

Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Deadlocks are a classic computer science problem, often taught to computer science students as the Dining Philosophers puzzle.

dining-philosophers-problem-comic.

Five philosophers sit around a circular table. In front of each philosopher is a large plate of rice. The philosophers alternate their time between eating and thinking. There is one chopstick between each philosopher, to their immediate right and left. In order to eat, a given philosopher needs to use both chopsticks. How can you ensure all the philosophers can eat reliably without starving to death?

Point being, you have two processes that both need access to scarce resources that the other controls, so some sort of locking is in order. Do it wrong, and you have a deadlock -- everyone starves to death. There are lots of scarce resources in a PC or server, but this deadlock is coming from our database, SQL Server 2005.

You can attach the profiler to catch the deadlock event and see the actual commands that are deadlocking. I did that, and found there was always one particular SQL command involved:

UPDATE [Posts]
SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3
WHERE [Id] = @p0

If it detects a deadlock, SQL Server forces one of the deadlocking commands to lose -- specifically the one that uses the least resources. The statement on the losing side varied, but in our case the losing deadlock statement was always a really innocuous database read, like so:

SELECT *
FROM [Posts]
WHERE [ParentId] = @p0

(Disclaimer: above SQL is simplified for the purpose of this post). This deadlock perplexed me, on a couple levels.

  1. How can a read be blocked by a write? What possible contention could there be from merely reading the data? It's as if one of the dining philosophers happened to glance over at another philosoper's plate, and the other philosopher, seeing this, screamed "meal viewing deadlock!" and quickly covered his plate with his hands. Yes, it's ridiculous. I don't want to eat your food -- I just want to look at it.

  2. We aren't doing that many writes. Like most web apps, we're insanely read-heavy. The particular SQL statement you see above only occurs when someone answers a question. As much as I want to believe Stack Overflow will be this massive, rip-roaring success, there just cannot be that many answers flowing through the system in beta. We went through our code with a fine tooth comb, and yep, we're barely writing anywhere except when users ask a question, edit something, or answer a question.

  3. What about retries? I find it hard to believe that little write would take so incredibly long that a read would have to wait more than a few milliseconds at most.

If you aren't eating -- modifying data -- then how can trivial super-fast reads be blocked on rare writes? We've had good results with SQL Server so far, but I found this behavior terribly disappointing. Although these deadlocks were somewhat rare, they still occurred a few times a day, and I'm deeply uncomfortable with errors I don't fully understand. This is the kind of stuff that quite literally keeps me up at night.

I'll freely admit this could be due to some peculiarities in our code (translated: we suck), and reading through some sample SQL traces of subtle deadlock conditions, it's certainly possible. We racked our brains and our code, and couldn't come up with any obvious boneheaded mistakes. While our database is somewhat denormalized, all of our write conditions are relatively rare and hand-optimized to be small and fast. In all honesty, our app is just not all that complex. It ain't rocket surgery.

If you ever have to troubleshoot database deadlocks, you'll inevitably discover the NOLOCK statement. It works like this:

SELECT *
FROM [Posts] with (nolock)
WHERE [ParentId] = @p0

It isn't just a SQL Server command -- it also applies to Oracle and MySQL. This sets the transaction isolation level to read uncommitted, also known as "dirty reads". It tells the query to use the lowest possible levels of locking.

But is nolock dangerous? Could you end up reading invalid data with read uncommitted on? Yes, in theory. You'll find no shortage of database architecture astronauts who start dropping ACID science on you and all but pull the building fire alarm when you tell them you want to try nolock. It's true: the theory is scary. But here's what I think:

In theory there is no difference between theory and practice. In practice there is.

I would never recommend using nolock as a general "good for what ails you" snake oil fix for any database deadlocking problems you may have. You should try to diagnose the source of the problem first.

But in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems. I asked around, and I got advice from a number of people whose opinions and experience I greatly trust and they, to a (wo)man, all told me the same thing: they've never seen any adverse reaction when using nolock. As long as you know what you're doing. One related a story of working with a DBA who told him to add nolock to every query he wrote!

With nolock / read uncommitted / dirty reads, data may be out of date at the time you read it, but it's never wrong or garbled or corrupted in a way that will crash you. And honestly, most of the time, who cares? If your user profile page is a few seconds out of date, how could that possibly matter?

Adding nolock to every single one of our queries wasn't really an option. We added it to all the ones that seemed safe, but our use of LINQ to SQL made it difficult to apply the hint selectively.

I'm no DBA, but it seems to me the root of our problem is that the default SQL Server locking strategy is incredibly pessimistic out of the box:

The database philosophically expects there will be many data conflicts; with multiple sessions all trying to change the same data at the same time and corruption will result. To avoid this, Locks are put in place to guard data integrity ... there are a few instances though, when this pessimistic heavy lock design is more of a negative than a positive benefit, such as applications that have very heavy read activity with light writes.

Wow, very heavy read activity with light writes. What does that remind me of? Hmm. Oh yes, that damn website we're building. Fortunately, there is a mode in SQL Server 2005 designed for exactly this scenario: read committed snapshot:

Snapshots rely on an entirely new data change tracking method ... more than just a slight logical change, it requires the server to handle the data physically differently. Once this new data change tracking method is enabled, it creates a copy, or snapshot of every data change. By reading these snapshots rather than live data at times of contention, Shared Locks are no longer needed on reads, and overall database performance may increase.

I'm a little disappointed that SQL Server treats our silly little web app like it's a banking application. I think it's incredibly telling that a Google search for SQL Server deadlocks returns nearly twice the results of a query for MySql deadlocks. I'm guessing that MySQL, which grew up on web apps, is much less pessimistic out of the box than SQL Server.

I find that deadlocks are difficult to understand and even more difficult to troubleshoot. Fortunately, it's easy enough to fix by setting read committed snapshot on the database for our particular workload. But I can't help thinking our particular database vendor just isn't as optimistic as they perhaps should be.

Posted by Jeff Atwood
232 Comments

The last place I worked had a strict nolock on every query policy.

Steve on August 25, 2008 11:37 AM

I find that the fact that most databases treat everything like a banking system seems to be a problem that affects performance greatly. The percentage of applications that require that level of transactional stability is extremely small. If you remove the locking, and assume that reading old data is not detrimental, then you can speed things up quite a bit.

Eric Kibbee on August 25, 2008 11:41 AM

This is all well and good, but I would refer you to the following post (you may not be familiar with the blogger, but he makes a good point):

http://www.codinghorror.com/blog/archives/001079.html

If you have a simple bit of code with relatively low usage, and you're getting deadlock, then I'd be very concerned that something very bad is going on under the surface. Closing your eyes, applying a bandaid and hoping that the underlying issue goes away is the road to lots of future pain.

Daniel Blumenthal on August 25, 2008 11:43 AM

Why not just cache some of the reads into memcached? should minimize stress on the DB and thus make deadlocks less likely.

Andy on August 25, 2008 11:47 AM

To quote:

Part 1:

UPDATE [Posts]
SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3
WHERE [Id] = @p0

Part 2:

SELECT *
FROM [Posts]
WHERE [ParentId] = @p0

And, the problem is that you are seeing a deadlock...

First, we must consider WHY this is the case... Note that a SINGLE lock can't deadlock, unless the resource is never returned. Neither of these SQL statements qualifies as an infinite loop. So, we must presume at least two resources. Note that, equally, we cannot assume NO locks, as that cannot deadlock either.

A possible answer to this is that there are separate READ and WRITE locks (there must be at least two locks). Any number of READ locks can be held by readers of a table. A WRITE lock is then exclusive (The WRITE lock can be obtained if there are no READERs). This would make sense; it is a normal design pattern.

This gives us the necessary two resources. Now, the READ resource should be acquired by statement 2. The WRITE resource should be acquired by statement 1. Aquiring WRITE of course implies ownership of READ as well. Note that the second statement is an UPDATE, which will of course use READ -- but it isn't needed because the WRITE resource implies the READ resource. Indeed, it would deadlock if this were not the case.

If the two resources (READ and WRITE) were not acquired atomically (relative to each other),then there may be a subtle race condition within the SQL service itself. Of course, this implies that the resources are implemented as two separate resources.

Another issue could be that the SQL optimizer could be mis-compiling the request and issuing a sequence like aquire(WRITE) ... do something ... aquire(READ) ... which would also lead to the deadlock.

If the optimizer ASSUMED that no updates where going on, it may begin with aquisition of READ, and begin the transaction. Later, it may determine a result set, and attempt to aquire WRITE -- which of course would fail (this would be a nasty bug indeed). Of course, it may assume that IT owns READ, and thus attempting to aquire WRITE is ok (and this naive optimization would work, AS LONG AS THERE ARE NOT CONCURRENT READERS).

Actually, there may be other causes to this. But, (and this is important) - if the problem is as a result of select/update, it is probably a defect within the database itself.

Indeed, there is NO direct solution to this deadlock, under the circumstances you have outlined. There is, however, a work-around.

Ensure that the database is NEVER UPDATEd if concurrent reads could be going on. Writes are fine, Deletes are fine, but ANY UPDATE would be suspect with this database library.

Alternatively (and my STRONG recommendation is to immediately get rid of the database), use explicit locking.

ratboy666 on August 25, 2008 11:55 AM

Seems to me that anyone doing any kind of database work for more than a few minutes has come across this deadlock situation. Locks on a database table are never taken because of a write, they are always taken because of a read; reads are what lock database table rows. Again, this is a fundamental concept of database design that I would expect a seasoned developer to know.

MS SQL Server's default isolation level (which controls the locking scheme) is SERIALIZABLE. Which means every read locks the rows that are read until the transaction commits. For reference, MySQL's default isolation level is REPEATABLE-READ, one down from SERIALIZABLE, but would also cause the deadlock situation described in this article.

Since nolock hints in MS SQL Server can be ignored by the query optimizer, it is advisable to change the isolation level instead, which can usually be done either in clint code (the isolation level can be set in the TransactionScope object in .net), in a stored procedure, or often at the connection object level. In this scenario, the READ UNCOMMITTED isolation level is perfectly acceptable since, as stated, no one really cares if the post count of a particular user is off by one or so (and the data is not used in a subsequent transaction, which is KEY) and will save you some memory on the database server (in contrast to READ COMMITTED SNAPSHOT). In a banking scenario, however, I'm sure the bank (but not the customers) would be somewhat perturbed if a customer made two quick money transfers between two of his accounts and suddenly had double the amount of money he started out with.

Lastly, this is why database transactions should be deliberately designed rather than be the bastard step-child of the code's business layer use-cases.

Robert C. Barth on August 25, 2008 11:56 AM

In my previous comment, I must ammend my first paragraph such that it is in the context of the deadlock situation given in the post. The update of course attempts to lock the row which is what eventually manifests the deadlock situation, however, it is not the root cause of the problem. The root cause is the locks aquired during the initial read and a subsequent read of the identical rows, with the intervening update waiting on the initial read's lock to release. Since the data read in the read situations is not used in a subsequent database transaction to update any data, it is perfectly acceptable to permit dirty reads in that situation.

Robert C. Barth on August 25, 2008 12:04 PM

Jeff,

I understand your concerns - you may be able to tolerate slightly older data,because most likely you only have ONE instance of your webapp that sees the old data.

Problems will occur if suppose you have multiple instances of your webapp, where you can run into a problem if your multiple instances see different data [or versions of the data] from the database. Then you run into data consistency issues.

When you have multiple application servers, you could run into a problem where at a given time some
I can tell that web applications use the DB as shared memory, that they rely on the fact that once an update is done

You have to pick a locking strategy for your applications.

kashif on August 25, 2008 12:05 PM

Of course, as much as everyone likes to talk about banking transactions, the reality is that banks heavily optimize their systems to not have these problems most of the time anyway. The majority of deposits and withdrawals in banking systems do not happen in real time. Most of the transactions begin at midnight, with deposits running first for a given account, followed by withdrawals. You have problematic categories like deposits that come from another account within the bank, but those are probably still handled as seperate transactions with the appropriate penalties if someone doesn't have the funds to cover their withdrawals after their deposits are processed.

In other words, in real time your bank statements don't do a whole lot anyway, and most people aren't checking their account balance at midnight.

Of course, investment banking and similar ventures are different, but even they have long periods of inactivity in a given 24-hour period that permit them to do large batch processes that have fewer chances of locking rows and tables in a database.

Vizeroth on August 25, 2008 12:05 PM

Is this post for all those who dabble in code without ever havign set foot in a CompSci class?

Is this the *real* audience for stackoverflow?

If you are getting deadlocks in your app, most likely cause is poor application design. Try queuing requests, get with mutexes, ... it has been done before.

BugFree on August 25, 2008 12:09 PM

Liar! Posting has declined because you are too busy twittering! You my friend have become a twitter whore.

BLACK HAL0 on August 25, 2008 12:12 PM

Well, MySQL was certainly much more optimistic, to the point of not handling transactions at all. That has changed a while back (a very recent while, IMHO). Whether they do optimize for lots of reads and few writes is another story.

And speaking of stories, normalized databases are optimized for heavy reads, low writes. Is it that practical experience teach people not to normalize their databases? Mmmmm.

As for deadlocks, the write lock can't be taken until all shared locks are released... or made to release. Emphasis on all, not on the time each one takes.

Daniel on August 25, 2008 12:12 PM

If you've got a SELECT that operates on tables A and B and an UPDATE
that operates on the same ones, then the SELECT will grab a lock on
table A, the UPDATE will grab table B, and then both will sit there
while they try to get a lock on the other table.

The database should be smart enough to handle that deadlock. It should just execute the first incoming query first and then the other. It may be complicated to implement (or they would have already), but i should be possible.

N on August 25, 2008 12:13 PM

Why do read locks by default? So people that don't understand the issues with dirty reads don't sit around for hours trying to figure out why they get inconsistent results.

In the examples given I wonder why there would be a common set of rows returned?

Al on August 25, 2008 12:14 PM

Where is the value of the [AnswerCount] coming from?

Josh on August 25, 2008 12:17 PM

* I work with MySQL/PostgreSQL/Firebird.
Joel F
is the only mention of Postgresql from all these entries. I would like to know why no one is using it. I find it much easier than SQLServer, its free, and it an awesome optimizing tool.

Mark on August 25, 2008 12:17 PM

forgot the word has between an awesome

Mark on August 25, 2008 12:18 PM

There is more than just a dirty read issue when using NOLOCK.

Rare, but you can actually get 2 records instead of one if your NOLOCK'd query executes during a page split operation.

Ed Stegman on August 25, 2008 12:19 PM

DB2 also has an unconditional read (WITH UR) that you can add to the end of your SQL.

slapout on August 25, 2008 12:21 PM

As we know from reading this blog for many years:
1. SQL Server is not broken
2. You, the developer, have caused this problem

I love the comments on this one - assumptions, conjecture, talk about unrelated things. A deadlock is an extremely rare and specific condition. It is not caused by inadequate hardware, or even 'over-adequate' hardware. I seriously doubt it could be cause by LINQ either.

There is a very specific situation that causes this, and I think the wrong queries have been focused on here. I can't imaging the select statement causing any kind of deadlock condition, because it only locks one row at the least, and one table at the worst. There has to be two locks on separate objects in both transactions (total of 4 locks - 2 complete and 2 waiting), for a deadlock to occur. SQL Server does not falsely report deadlocks. If it says there is a deadlock, there is. I love the way developers always take these discussions into unrelated territory - there's a blog for you Jeff.

Jasmine on August 25, 2008 12:22 PM

From my experience SELECT's almost never could cause deadlocks, no matter with or without nolock hint. You should carefully look at your UPDATE statements to see if they could cause locking.

Andrey on August 25, 2008 12:23 PM

Use Memcache to avoid doing db reads (often), which should drastically reduce the possibiliy of getting deadlocks, and you're app will run much quicker.

meow on August 25, 2008 12:30 PM

I'm confused. NOLOCK is the default for select operations. I'd read this before and confirmed it now.

Brad on August 25, 2008 12:30 PM

I get that, but I don't understand why simple read SELECTS need to lock *at all*.

The classic example is the store clerk who uses the database to check his inventory to make sure he has a particular widget in stock before making a sale. He checks and sees one in stock, so he goes ahead and makes the sale. While he is entering his order details, another salesman finalizes an order for the last item. Now he can't fill his order in time.

That explains the basic problem, but it doesn't quite satisfy in this case. After all, the example implies an inventory transaction of some kind, and you aren't specifically including any transacations.

An example closer to your particular app would be the sql required to display a particular question page. I'm not in the beta so I can't see what it looks like, but from the description you must have some code to get a list of responses associated with a particular question for inclusion in the page. Not only that, but updating a question will _always_ cause this code to run at nearly the same time as the update, so that the new page can be shown to the user.

Now let's say part of the retrieval code joins in another table. If it joins anything at all, it needs to do some locking for both tables in the join to make sure the join is composed correctly. Otherwise something might change while the join is in progress that could change things. And if you kick off your update in a separate thread to avoid bogging down your display code, the chances of them needing the same data at the same time are actually pretty high.

Joel Coehoorn on August 25, 2008 12:30 PM

I guess what I'm saying is that Starbucks Doesn't Use Two-Phase Commit

..and Ebay doesn't use transactions
http://www.infoq.com/news/2008/06/interview-shoup-ebay

Jim on August 25, 2008 12:38 PM

OK, but why doesn't SQL Server re-issue the transaction on my behalf, then? Automatic retries of some kind?

Because the transaction happens on the code, not on the database. For example:

begin transaction
x := select availability from inventory where item_id = XYZZY
y := select sum(quantity) from sale where item_id = XYZZY and sale_id = XYZZY
x := x-y
update inventory
set availability := x
where item_id = Y
end transaction

Ok, so you got through the first read and then deadlocked in the second. How can the database retry it for you? You already got the first read inside x. The database might recalculate that value, but it can't tell your code to update x.

That's why the code has to retry the transaction.

That's also the why the read causes a lock. Suppose you read the availability, and then someone else tries to update it. If you hadn't locked it in first place, how would you know, at the time you issue the update, whether anyone changed availability or not?

It becomes even more important if the database is not properly normalized, but as my example got big and confusing, I decided to leave it at that. :-)

Daniel on August 25, 2008 12:44 PM

As it turns out, the database you are using can create deadlocks all by
itself, at any old time (well, there are reasons, but it's complicated).

The only thing that can be done is to wrap each sql series into a
sequence that retries the problem -- this is just a manifestation
of the class pc-losering issue.

It is likely that if it is really a read-only table, it won't deadlock,
pending implementation details of the sql library. Throw an UPDATE
into the mix -- you may well be screwed with a deadlock.

The implication is that ALL access (read, write, or update) needs to
wrapped with retry code.

So, go and wrap, be happy. Of course, if you run into a REAL deadlock,
things will get a bit ugly.

ratboy666 on August 25, 2008 12:54 PM

SQL Server is notoriously nasty about locking on reads. I run into these problems all of the time. NOLOCK can certainly help. Adding an index (or two) will usually resolve these issues. What often happens on a SELECT if it doesn't get an index hit SQL Server will do a table scan. Table scans (in addition to being slow) may require a table lock. So you get one user thread accessing a table at a time. Not fun.

craig on August 25, 2008 12:57 PM

Sadly enough, admitting you aren't a database expert draws the ire of DBAs everywhere. Be careful in admitting your own problems online, Jeff.

Hope this isn't what we can expect from StackOverflow. I really love the idea of presenting a problem and watching it followed by a flurry of insults and criticisms.

HB on August 25, 2008 12:57 PM

Long, long ago, I think back in 2002, I had a similar problem with SQL Server. I was also bewildered that SQL Server, which prominently bragged about row-level locking, could have such common deadlock problems.

What I eventually discovered is that although the *table* data was row-level locked, the *index* data was page-level. Since I was usually inserting sequential key data and reads tended to be clustered on recent inserts, that meant that reads were frequently being locked out of the *index* due to nearby inserts updating the index. MS had no useful advice at the time on how to resolve this; all they could say was to minimize index access. I never could figure out how to reduce deadlock to an acceptable level while keeping the indices I needed to maintain query performance.

So I switched to Firebird and the problem went away.

KJB on August 25, 2008 1:04 PM

It's always a little disturbing to see a well-known coder ask a dumb question, but come on, database locks?

Wrong. Dead wrong and incredibly dangerous. And egotistical. If I were interviewing you I would immediately flip the bozo bit and thank you for your time.

It’s impossible to know everything. The hallmark of a good programmer is not what they know but their ability to learn what they don’t. If you do not promote an environment where any question can be asked, no matter how naive and trivial, problems become intractable because people are too afraid to ask for help. Just because someone has been in the industry for a while doesn’t mean they know everything.

Chris on August 25, 2008 1:09 PM

It's actually possible for two SELECT statements to deadlock in SQL Server (at least in SQL Server 2000), if an index on the table is involved. No writes required!

Deadlocks are an almost unavoidable fact of life in any large database application. If NOLOCK works for you, go with it, but you should understand when you can and can't use it safely, and it sounds like you mostly do.

Matt Wigdahl on August 25, 2008 1:15 PM

Are we all only web developers here?

Back in the day when we all wrote client apps, one common them (before disconnected recordsets) was that if you saw the data on your screen then it hadn't changed.

i.e. You locked the data when you loaded that set of data into your view, and then released it when you clicked Next or Update or whatever.

Dave on August 25, 2008 1:17 PM

@Chris - Amen to that...

HB on August 25, 2008 1:19 PM

NO LOCK is used in SQL Server for backward compatibility. To be really hip you want to put this at the front of your SELECT statement:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED.

Put this on every SELECT you have that doesn't do an action and you will speed up everything in your programs, applications, servers, and ssurrounding vicinity. Everytime anything is doing a SELECT it is locking tables.

And don't do any SELECT INTO. That locks the TempDB. Build the table and then APPEND the rest.

Or one row SELECT INTO and then APPEND the rest.

John A. Davis on August 25, 2008 1:34 PM

How can a read be blocked by a write?

Well sure, either your update statement is still running, or the transaction within which the update statement had executed has not yet been committed or rolled back. By default SqlServer's isolation level is read committed, so your query might read uncommitted data otherwise. Did you profile how long the Update takes to execute and whether its embedded inside a longer-running transaction?

Arno on August 25, 2008 1:45 PM

But I can't help thinking our particular database vendor just isn't as optimistic as they perhaps should be.

Making a decision that is at odds with the rest of the technological world? This is *Microsoft* we're talking about, right?

Also, a credit line for the theory vs practice quote that you hacked up a bit:

a href=http://en.wikiquote.org/wiki/Jan_L._A._van_de_Snepscheuthttp://en.wikiquote.org/wiki/Jan_L._A._van_de_Snepscheut/a">http://en.wikiquote.org/wiki/Jan_L._A._van_de_Snepscheut/a">http://en.wikiquote.org/wiki/Jan_L._A._van_de_Snepscheuthttp://en.wikiquote.org/wiki/Jan_L._A._van_de_Snepscheut/a

Frank on August 25, 2008 1:51 PM

Sorry everyone. That link butchering is on server's end, not mine.

en.wikiquote.org/wiki/Jan_L._A._van_de_Snepscheut

Frank on August 25, 2008 1:52 PM

Database deadlocks are not an error condition. At least they shouldn't be, if the rest of your software is coded correctly.

The point here is that database deadlocks can and will happen all the time, they are just not all that likely if your transactions are very short as in a web scenario. But they will happen, regardless of what you do with READ UNCOMMITTED etc., at some point of time two users will try to change the same (or connected) thing at the same time, and there you have it.

If you do any database stuff, please organize your code so that conceptually linked queries and operations are contained in blocks that you can repeat on deadlocks. That is, when serving web requests, put your request handling in a method, and put that into a while loop that retries as long as you get deadlock exceptions. Safeguard against eternal loops by counting the number of retries and aborting if they exceed some N.

This of course requires you to be careful about any variables and other state that might leak from that retrieable method, but it's the only way to do database operations correctly.

Martin Probst on August 26, 2008 2:12 AM

I'm a little disappointed that SQL Server treats our silly little web app like it's a banking application.

Imagine how pissed you'd be if SQL Server treated your banking application like a silly little web app.

Ian Nelson on August 26, 2008 2:31 AM

It's as if one of the dining philosophers happened to glance over at another philosoper's plate, and the other philosopher, seeing this, screamed meal viewing deadlock! and quickly covered his plate with his hands.
Love this! :-)

Although I'm only an amateur DBA (and an ex-amateur-DBA at that!), I've always been under the impression that databases suffered more from the a href=http://en.wikipedia.org/wiki/Readers-writers_problemReaders and Writers problem/a. This would seem to fit your starvation problem better than the Dining Philosophers' Problem, Jeff, as the large number of reads you're doing are locking out the write; it also mirror's ratboy666's comment.

Alastair Smith on August 26, 2008 2:47 AM

SQL server provides lots of opportunities for deadlocks. Whenever I encounter this scenario, I look at my Update/Insert/Delete statements and I add the WITH(ROWLOCK,UPDLOCK) query hints.
This suggests that SQL Server first locate the rows to be updated (UPDLOCK) and then lock the effected rows only rather than the entire page(s) that contains the rows.

Your new statement would look like the following:

UPDATE [Posts] WITH(ROWLOCK,UPDLOCK)
SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3
WHERE [Id] = @p0

It has worked for me. Good luck!

D Garcia on August 26, 2008 2:50 AM

If you ever stuck with any big problem, you can always ask the community for hints and suggestions.

Varun Mahajan on August 26, 2008 3:14 AM

Too bad you dont use Java: http://java.sun.com/j2se/1.4.2/docs/api/java/sql/Connection.html#setTransactionIsolation(int)

;)

Rubem Azenha on August 26, 2008 3:25 AM

This is an excellent post . Thanks a lot . I am grateful to you .

on August 26, 2008 3:46 AM

Would have been nice to see how many users you have hammering away to get this error. I've never seen this error from MSSQL or MySQL, and I've used them a lot in both read and write heavy environments.

Mattkins on August 26, 2008 4:07 AM

The biggest problem i have seen relating to this scenario is when the statement causes lock escalation in the database. If your first query is running a table scan rather than an index lookup it is conceivable that SQL with give up locking the row or page and lock the whole table instead. If this happens on your update statemtent then all your subsequent transactions will be blocked until the update finishes and you end up with the deadlock problems you are seeing. If it happens on a SELECT statement all your UPDATE transactions will be blocked.

This problem is compounded if your data is very large and one row spans multiple data pages.

So before setting the read committed snapshot I would first check the queries are using index lookups and try and check if the data size of a row could be reduced.

Do you really need every column in the [posts] table?

John Hunter on August 26, 2008 4:40 AM

Don't if I'm to purist, but I avoid the nolock like the death, even that it gets faster, because you could show wrong informations to your user, and it could be realy bad, for example, if you are leading with money. I prefer making the query and passing the values to local variables and then killing the recordset, and consuming the variables in my next query.

Otvio on August 26, 2008 5:42 AM

Interesting article. And while I agree that dirty reads can fit the bill, I think the point is that one makes an informed decision on whether a dirty read is ok. Sure for most website applications few writes and mostly reads. No one cares if their profile is a few seconds out of date or if thaty aren't seeing Bob's post from 1 ms. after they submitted their own, but if your application say tracks an ordering system, it isn't appropriate to count 15 orders when 3 of them are still completing transactions before they can truly be counted. The default locking mechanism is aimed at keeping data-integrity up, not user wait time.

I feel that if it's important enough to be in a transaction in the first place, it's important enough to fix the cause of the deadlock and not just ducttape a dirty read into the query. Having said that, for non-critical data it's fine, just makes me worried when I see it. :-)

Regards!

JMinadeo on August 26, 2008 6:19 AM

If, like most web applications, you are doing mostly reads, and few writes, why use a database at all? Most web apps are just doing the same query over and over again. Just generate and store the content of each page and serve that, regenerate on change. You can use whatever granularity of locking you want that matches how you store the content. I've never understood why so many web applications are just front ends to a big pile of the same SQL queries over and over again, it's never made much sense to me but I guess that's what everyone else does, so why not do it too?

Reed on August 26, 2008 6:22 AM

UPDATE [Posts]
SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3
WHERE [Id] = @p0

I think the update statement is trying to convert a read lock into a write lock. I would look at putting a locking hint on the select that first read the post in the transaction to make it take out a write lock at that point. Otherwise do the update in a separate transaction.

Ian Ringrose on August 26, 2008 6:33 AM

Seems like a case of bad defaults... just reading what read committed snapshot is and it sounds like how it should be implemented by default.

If we request a row that is being modified by some other transaction, our transaction will wait until the other is finished and fully committed.

Sounds like common sense... queueing database requests is hardly rocket science. :P

Clearly, you should be using MySQL. :)

jheriko on August 26, 2008 6:35 AM

Re: choice of SQL Server
He wants to use LINQ, so he must use SQL Server. LINQ doesn't talk to anything else at the present time (there's only one sql-translation provider).


Re: deadlock -
It's not really about query complexity... it's about multiple tables involved in a single transaction in a multiple user environment.

Are there any triggers that the update statement activates that reads other tables?

Is the select issued in its own transaction, or are other tables queried in the same transaction?

David B on August 26, 2008 7:25 AM

Someone has to explain to me sometime - why every time a problem is posed, peoples first response is to tell you that you should use a different platform/technology/brand.

Same thing happened in the comments to having issues with Windows and Wordpress, half the responses were use x blog engine instead, or use linux.

Here it's use x db technology.

And ta for posting this Jeff, I don't get the responses that expect you to always be an expert, and I don't think I'd learn as much if you were an expert and didn't need to pose the problems!

Andrew Tobin on August 26, 2008 7:36 AM

Maybe if you open sourced your webapp, you could gain the benefits of peer code review! ;)

-Some- bugs are shallow with enough eyes...and maybe development would speed along, and then you can get back to writing more posts to enlighten the masses.

Why not open source at least some of the stackoverflow website?

Tim Oxley on August 26, 2008 7:40 AM

MS-SQL is not a particularly strong database for online transaction processing, at least not compared to Oracle. In the mid-1990's, Oracle developed a form of multiversion concurrency control that pretty much devastated the competition: it's sort of like nolock, but safer.

MySQL rocked the world by coming out with the blisteringly fast MyISAM table handler, which traded transaction support for performance. Not having transactions is a real loss, but many people have found MyISAM is reliable enough for what they do. MySQL also supports the InnoDB table handler, which uses MVCC. InnoDB is remarkably fast for ordinary operations and commits, but it can really burn you if you need to rollback a big transaction. I've heard of InnoDB wrecks where it took more than a week to recover from a system crash.

Paul Houle on August 26, 2008 7:40 AM

Evolv CS (http://www.defran.com/evolv-cs.asp) upgraded to SS 2005 and told us to add the nolock option to ALL of our custom queries. So maybe this is common practice.

Joe Beam on August 26, 2008 7:48 AM

We are writing an app that should work at least with Oracle and SQL Server, therefore using non standard hints like NOLOCK is not an option.
It's ok to have propietary extensions but SQL Server should support ANSI standards better. It's incredible that the concatenation operator || is not available. I guess they haven't changed the Sybase core much, that's unfortunate.

lluis on August 26, 2008 7:59 AM

Great examples here of why comments need ratings so you can separate the rantings and insults from the answers. I did learn some things from reading the comments about the differences between database designs.

I've been writing database backed applications for 12 years now- never hit a deadlock on a read. Even in SQL Server 2005. Of course, I generally follow the rules...

matt m on August 26, 2008 8:11 AM

You could always try with read uncommitted

Sql on August 26, 2008 8:30 AM

These types of problems are a big part of why SQL Server introduced MVCC features in the 2005 release. Most of the DB world has moved to MVCC as their default behaviour (Oracle, Postgres, MySQL, Firebird/Interbase, etc) with DB2 being the main holdout that still uses the pure locking model for database transactions.

I suspect they didn't change the default behaviour (thus treating you as a banking site) in order to limit breakage to backwards compatibility. But seriously, for web apps MVCC is the better approach.

Jess Sightler on August 26, 2008 8:36 AM

But instead of storing it into Post table, store number of replies to PostCommentCount table.

The whole point of storing the count instead of calculating it is making reads cheap. So now you are asking for an extra join? Why not just tell him to calculate it each time like the other people who don't get it.

a on August 26, 2008 8:42 AM

Someone has to explain to me sometime - why every time a problem is posed, peoples first response is to tell you that you should use a different platform/technology/brand.


It's because they don't know how to solve the actual problem. This is not a case of a bad product - it is a case of a mistake in programming, and it can be fixed. I'm sure that's why it was posted in the first place. I don't like the solution, but the point is, it was solved without re-working the whole site, changing databases, or anything else drastic - he just added two little words to the query. Simple problem, simple solution.

Jasmine on August 26, 2008 9:19 AM

I agree with some of the comments (though I might phrase it differently) that you would probably be happier if you learned more about databases.

The other thing is, you know the old axiom select(2) is not broken? That if you think you found a bug in the OS, 99% of the time it's in your code, and the other 1% of the time you need to check your code even harder?

Not true with Microsoft products.

Chris D on August 26, 2008 10:01 AM

I suspect that the reason why there aren't many google entries for MySql deadlocks is because the default storage engine in MySql (MyISAM) doesn't have deadlocks.

From the MySql docs at http://dev.mysql.com/doc/refman/5.0/en/internal-locking.html

Table locking in MySQL is deadlock-free for storage engines that use table-level locking. Deadlock avoidance is managed by always requesting all needed locks at once at the beginning of a query and always locking the tables in the same order.

However, this is had at the expense of transactions.

Kyle Cunningham on August 26, 2008 10:27 AM

The deadlock part seems pretty well covered so I'm going to echo the, AnswerCount field is probably a bad idea and should be calculated. sentiment.

Adam on August 26, 2008 10:30 AM

If you have long running transactions, lock all the required data in the beginning to prevent deadlocks:

Thread 1:
begin tran
declare @x int
Select @x = A.x
from A with (holdlock, updlock)
join B with (holdlock, updlock)
on A.id = B.id
where id = 1

update B
set y = @x
where id = 1
commit


thread 2:
begin tran
declare @y int
Select @y = B.y
from B with (holdlock, updlock)
join A with (holdlock, updlock)
on B.id = A.id
where id = 1

update A
set x = @y
where id = 1
commit


If you don't like to join you can write:
select x from A, B where A.id = 1 and B.id = 1


------

And please never use NOLOCK.

Using the NOLOCK index hint can result in phantom data. When you query a table while another process is inserting a set of data in the table and the insert fails (duplicate key error maybe), the select query can return data that was never committed to the database.

Filip on August 26, 2008 10:33 AM

SQL Server - That is MySql, Progresql, Oracle, MS-SQL or any other SQL server

MySql Server is MySql

MS-SQL Server/Microsoft SQL Server, however is Microsoft's SQL Server,

The Generic name has been appropriated by Microsoft for marketing reasons, Various SQL Servers (including the original engine Microsoft used) are older than Microsoft the company ....

Deadlocking on dirty reads on a web interface that does not really care if the information is a few seconds out of date sound like the perfect case for using nolock ....


Jaster on August 26, 2008 10:43 AM

The deadlock issue has nearly been beaten to death, but I want to add one more thing. I think there has been a lot of confusion in the discussion about the isolation levels that Jeff is talking about using. NOLOCK is equivalent to READ UNCOMMITTED, but if you read the entire post, that's not what Jeff is talking about using. He's planning to use READ COMMITTED SNAPSHOT, which does no dirty reads. With snapshot isolation enabled, SQL Server creates a duplicate copy of the data page before beginning any transaction that includes a write operation and purges it after the transaction is committed; if that copy exists, SQL Server reads from the copy. That eliminates the locking issue, and therefore should eliminate the deadlocks. It does require more disk resources, but in my experience, snapshots have been very fast. I wouldn't use a dirty read unless it was absolutely necessary - all sorts of nasty side effects can come of that if you're not very, very careful - but this isn't a dirty read. You're still reading the data that has already been committed to the database.

Jeff, if you haven't done it already, it would be worthwhile to use SQL Profiler to run a trace on your database while the app is running. This has helped me sort out a number of issues since it tells you everything the database is doing. You may find a transaction that LINQ is creating implicitly that you weren't aware of, and even if you do choose to use snapshot isolation, correcting a transaction like that may help the performance of the application.

Good luck!

Ed

Ed Leighton-Dick on August 26, 2008 11:30 AM

You need to read this one Jeff. You happened to mention you are using LINQ To SQL for your design. Please tell me you didn't use the Linq To SQL designer in Visual Studio to define your database tables. I'm sure LINQ To SQL works just fine if you write the code yourself, but using the designer is going to make your site SLOOOOOW!.

I recently developed a Virtual Tour posting system for real estate agents at my job and I used the Linq To SQL designer to define my tables to start with (programming in C#). I never could find out why a single page would take seconds to build (1.93s it turns out). Finally I used the trace feature and found the DataContext.ExecuteMethodCall procedure run from the designer code was taking .73 seconds to retrieve ONE row from the database, from a table with ONLY 17 rows, used a stored procedure that was ONLY a simple select statement, even with DataContext.ObjectTrackingEnabled set to FALSE.

That method uses a lot of reflection and testing of relationships to insure the data is valid. I rewrote the Data Access Layer of my system, still using attributes and reflection for parameter types, inherited their IExecuteResult interface to add a little more flexibility, and now that same call takes 0.000434 seconds, and now the page builds in 0.034 seconds. I know it sounds like an exaggeration, but I am dead serious. MS' code for that method must be incredibly slow. And so you don't think it is my system, our server is a twin dual-core (4 cores) Xeon 2.8GHz, 8GB RAM, Window Server 2008 Enterprise, SQL Server 2005 Professional, 1.6TB RAID 5 array. I could watch the page peg the processor at 26% (their code only using one thread) for over a second in the task manager.

Thought you need to know that, but only if you used the designer in Visual Studio.

Christopher Tillman on August 26, 2008 1:02 PM

The deadlock does not occur when an operation takes too long on a resource and another is waiting for the same resource.

It means, instead, that a query, of any nature, is waiting for a resource that is taken by another query, that is waiting for a resource of the first one. obvously there's no solution for this, if none of the query explicitly dies a horrible death, so that the other one can continue.

In SQL Server, there're two ways of accessing the data: Read Committed and Read Uncommitted, the first being the default (and yours).
In Read Committed, each query, even a Select, must access only data that has been completely written and processed and is not being currently modified or handled by another query. That ensures that your data is always the REAL data, and not some old data you've already overwritten during the select itself.
Read Uncommitted, on the other hand, tells the Select to look for anything it can give you right now, even if it's being updated by some other query during the select itself. I leave you to the documentation on how to switch the two modes.

Anyway there's another way: you can use a
SELECT *
from BLAHTABLE nolock

the nolock will tell the select to ignore any lock, like if it was a read uncommitted environment. Note that you can't use a nolock in an update, but you can still reduce the risk of a deadlock by using a rowlock instead. Beware rowlock will have an impact on the performance though.

AkiSkyo on August 26, 2008 1:37 PM

2) The root cause is not using Oracle. If you are using Oracle and still get a deadlock, you have an unindexed foreign key and failed to heed the warnings in TOAD or OEM. I only need one hand to count the number of deadlocks I've encountered in a decade of using Oracle. All had obvious causes and were discovered early in development.

I guess you haven't tried running Oracle in full SERIALIZABLE isolation mode then. Try that and say hello to Cannot serialize the transaction error unless you're doing just really, really small transactions.

But overall, deadlocks and serialization errors are a fact of life. You just try again and hope you succeed. Happily, in our framework, we just rollback the transaction and call the continuation that was saved when the transaction was started.

TT on August 26, 2008 1:52 PM

I've never used MS SQL Server, I've used oracle for over a decade.

As has been metioned, in Oracle readers don't block writers, and writers don't block readers.

A thread above suggested that MS SQL Server was better because it turned off the 'snapshot' feature that enabled this behaviour by default : personally, I'd rather have correct than fast in the first instance anyday...

I have seen a couple of Oracle deadlocks in my day, always caused by U1 updates A and tries to update B, while U2 updates B and tries to update A.

JAycee on August 27, 2008 2:47 AM

SQLServer is a work in progress for Microsoft compared to Oracle - from plenty experience deploying to both platforms we've had MUCH better performance, much fewer deadlocks and fewer headaches in general with Oracle.

Frank on August 27, 2008 4:20 AM

Has anyone mentioned triggers?

I usually avoid triggers because 'tis easy to forget they exist and get included in any transactions you might generate. So that simple SELECT or UPDATE might be introducing a second table for the deadlock through a trigger.

Harbour Master on August 27, 2008 4:52 AM

I amend my previous statement - no SELECT triggers in MS SQL.

Harbour Master on August 27, 2008 4:57 AM

How about this case?

CONNECTION 1: Update BankAccount Set Balance = Balance - 10 WHERE ID =1
CONNECTION 2: select Sum(Balance) From BankAccount
CONNECTION 1: Update BankAccount Set Balance = Balance + 10 WHERE ID =2

Connection 1 is moving $10 from account 1 to account 2. Connection 2 in the meantime just read that there is a total of $10 less than actually exists. Nothing rolled back, but you've just completely lost atomicity.

Tim C on August 27, 2008 6:22 AM

Hey Jeff,

You're not storing the Id of the original post (primordial one) in it's ParentId col are you? This would obviously introduce immediate overlap and a race-condition for the offending (parent) post. I can't believe it would be that obvious but, if it is you can SQL-around that one with an extra clause to ignore that particular record. I'm reading waaay too much into a DM that i didn't design!

I think in the case of this race condition to re-acquire a Read lock on a matching row amongst the Update and Select you should perform the retry as per direction via the db error. Have you tried this as a default option before applying hints that result in dirty reads for some of your users? It smells like your data isn't that concurrent anyway - the select seems to pull those records that were related to a parent record which the UPDATE is isolated to changing. I suspect your comment about denormalisation i.e. storing responses to an inital post in the SAME table have A LOT do with deadlock as normalising parent-child fashion would result in separate tables used in each query.

Anyway fundamentally it's perhaps a usability choice - do you allow your users to see dirty data? Would they know it was dirty? Or do you trap the error when you know data to be dirty coz SQLServer is telling you it is and requery on behalf of the user. For me - the system should do this but, the cost should be measured in terms of trapping the error and requerying and it's affect on your desires to scale up.

chorleyboy on August 27, 2008 7:46 AM

Deadlocks are caused by the Database Designer and the Developers. It is always possible to write a system that is incapable of causing a Deadlock. Of course, this is the same as saying it is possible to write code that has no bugs and therefore easy in theory yet incredibly hard in practice.

Turning COMMITTED SNAPSHOT on can result in two identical SELECT statements within the same transaction returning different data. These two SELECT statements are both returning fully correct and valid data. However, that is the reason is is not turned on by default.

I've seen two brilliant comments here:

Ian Nelson stated Imagine how pissed you'd be if SQL Server treated your banking application like a silly little web app.

S stated It's not 'silly' that SQL Server treats your website as a banking operation; personally I would expect nothing less. Correctness is a vastly more important than performance. Only the developer can trade off correctness for performance; and it is rarely needed.

I couldn't agree more with these statements, you as the developer has to be the one to choose what level of locking you require. There is a time and a place for all of them, from the pessimistic default to the much feared (nolock). However, as the majority of us Developers do suck, I'm rather glad Microsoft have taken the default approach they have, even if it is unnecessary in a large number of cases.

If they were to justify reducing the level to READ COMMITTED SNAPSHOT, then they could in fairness reduce the default to READ UNCOMMITTED, how many people would that affect?

Robin Day
http://www.advorto.com

Robin Day on August 27, 2008 8:03 AM

The devil is in the details, and in this case there are a lot of details. I'd like to echo the sentiment expressed by some people that I do hope the answers on stackoverflow will contain a little less vitriol. Lock issues are hard, even seasoned DBAs can run into them on occasion. Having done a database course or two does not make you anymore of an expert on the exact cause of these issues and it doesn't make you smarter than Jeff.

Other than that, I hope the type of person who likes to insult other people's intelligence will end up with appropiately low karma so I can just ignore their posts on the eventual stackoverflow. (which I'm not beta testing cause I'm lazy ;) )

wds on August 27, 2008 10:17 AM

To illustrate read uncommitted, if all your selects used read-uncommitted, you could have

employee ID 12 is salary of $50

SELECT Salary FROM Employee WHERE ID = 12

phantom data a possibility
0s Process A - starts transaction - multiplies all salaries by 100
1s Process B - reads salary for user X (with nolock - read-uncommitted) .. read value is $5000, or $50, or both
20s Process A - does heaps of other things and then rolls back transaction

whereas in snapshot mode
0s Process A - starts transaction - multiplies all salaries by 100
1s Process B - reads salary for user X .. read value is $50
20s Process A - rolls back transaction or commits transaction

So there's nothing wrong with snapshot data, because it's not committed, it just prevents you from seeing data before it's committed, but doesn't make you wait. It's like you just reordered the read to come before the write started. It's a better version of NOLOCK, but it does come at a price. And maybe the reordering doesn't make sense to your application.

So the only alternative is you block readers from reading data that is under change.

0s Process A - starts transaction - multiplies all salaries by 100
1s Process B - wants to reads salary for user X - lock wait
20s Process A - rolls back transaction
21s Process B - wants to reads salary for user X .. read value is $50

And as pointed out earlier, if you block readers for rows that may change, and the reader is locking tables in a different order than writers, then you have deadlock.

monty on August 27, 2008 11:02 AM

Robin Day Wrote: If they were to justify reducing the level to READ COMMITTED SNAPSHOT, then they could in fairness reduce the default to READ UNCOMMITTED, how many people would that affect?

I think you aren't understanding the difference between these two, and I think Jeff isn't either, as he said: With nolock / read uncommitted / dirty reads, data may be out of date at the time you read it, but it's never wrong or garbled or corrupted in a way that will crash you.

Read uncomitted doesn't mean data is out of date, it means you are seeing data BEFORE its in date. It means you are seeing the absolute latest *uncomitted* data. Read Committed Snapshot on the other hand means that you ALWAYS see the latest *committed* data. Read Committed Snapshot allows for 100% ACIDity. Read Uncomitted allows for 0%.

Tim C on August 27, 2008 12:00 PM

Tim C:
Read uncomitted doesn't mean data is out of date, it means you are seeing data BEFORE its in date. It means you are seeing the absolute latest *uncomitted* data. Read Committed Snapshot on the other hand means that you ALWAYS see the latest *committed* data. Read Committed Snapshot allows for 100% ACIDity. Read Uncomitted allows for 0%.

Almost, until that last sentence. The %-age is unknown a priori. The *only* time that UR data is wrong is when the Uncommitted transaction is Rollback-ed. In no other case is it wrong. Depending on the nature of the system (i.e. no programmatic Rollback, no Deadlock, and Timeout), it can be 100%.

BuggyFunBunny on August 27, 2008 12:37 PM

Lastly, this is why database transactions should be deliberately designed rather than be the bastard step-child of the code's business layer use-cases.
Robert C. Barth on August 25, 2008 10:56 AM

Hallelujah. Like object databases, this is another useless idea designed to make programmers think they can write code by clicking about a few times in an IDE. To design a working system like you want, you need to do it the old-fashioned, boring but reliable ways. LINQ is just like the Ruby programs that are finding they don't scale, they're nice ideas, but that's all they are.

If you are designing something that will be putting food on your table, you want it to work. If you're designing a playground for yourself, only then can you use whatever you like.

AndyB on August 28, 2008 4:02 AM

Simple solution: Change to Firebrid

Cornie on August 28, 2008 6:04 AM

@Tim C:
How about this case?

This is a non-atomic transaction. What happens depends on the implementation. And in an OLTP system (really, any multi-process one), such a read will always be *wrong*, since there are multi-transactions (whether atomic or not) at any point in time. The read will interleave some writes; therefore it is *wrong*. If you've setup uour database (and it supports this) to read from buffers immediately (recognize commit on buffer update; don't wait for flushes, table or log), you shouldn't see the issue. But as others have mentioned, MVCC rather than locking is the answer.

BuggyFunBunny on August 28, 2008 6:06 AM

LINQ is just like the Ruby programs that are finding they don't scale, they're nice ideas, but that's all they are.

I'd agree for using it with SQL, but in-memory objects like Arrays, Lists, loaded XML documents, etc... LINQ has been incredible.

HB on August 28, 2008 7:17 AM

FWIW,

I provide an explanation of why and how deadlocks can occur with only SELECT and UPDATE statements outside of transactions

http://www.samsaffron.com/archive/2008/08/27/Deadlocked+

Sam on August 28, 2008 7:22 AM

Funny thing:
During one project we found out that running parallel database queries didn't improve speed over serialized database queries.
(E.g. we had similar queries running one minute each. Running
them in parallel it took 6 minutes and then all of them finished.
So we didn't win anything by parallel execution - even worse the
response time for all users got unacceptable).

So the trouble about locking and concurrent operations and everything
was pointless - batching all queries and running them one after the
other didn't slow down the system and all problems about locking
and stuff were gone.

Sometimes I really wonder if todays heavily optimized x86-CPUs
really run faster than a overclocked 6502 CPU (CBM 64) running
on 3 GHz ;-)

Erik on August 28, 2008 10:24 AM

@Sam - insightful stuff. Another thing to keep me from sleeping at night.

David B on August 28, 2008 11:29 AM

Is it possible to somehow filter out suggestions like Use MySQL or Switch to Linux or Oracle is the best! or Microsoft rules! or I love Macs! or whatever. I don't mean just from this blog's comments, I mean from life in general.

Tippy Hedren on August 28, 2008 12:21 PM

One of the 'underrated' causes of deadlocks is the absence of covering indexes (i.e. non-clustered covering indexes in the SQL Server world). Grossly simplfying, reads (SELECTs) that can be satisfied wholly from a covering index, will not require an extra lookup into the clustered index (which is predominately where the locking for updates or inserts will be occurring)

Mitch Wheat on August 29, 2008 8:29 AM

You need to look at what else is happening in the transactions that are deadlocking, its the transactions that hold the locks, not the individual SQL statements.

As Jasmine said above, for a deadlock to occur one transaction must have locked resource A and then tried to lock resource B, then the second transaction must have locked B, and tried to lock A, causing the deadlock.

In the case of read-write locks, it's also possible that the both transactions have (non-exclusive) read locks on B and one has a (exclusive) write lock on A.

Once you have understood this, you can solve the problem using a technique called Structural Deadlock Avoidance. It's pretty simple, but it requires a bit of hard work and discipline. It works like this:

1) assign a unique number to each resource that you want to lock. E.g., resource A = 1, resource B = 2.

2) when locking multiple resources, always lock the one with the lowest number first, then the next lowest, .... E.g., lock A first, then lock B.

If you do this you can see that you'll never deadlock, because you always lock resources in the same order.

This technique works with any kind of locking construct, object locks in Java or .NET, file locks, semaphores, or database locks.

In a database the resources you number are tables. E.g., TABLE A = 1, TABLE B = 2, ... then you order your transactions like this:

Transaction 1:
BEGIN
SELECT ... FROM A ...
UPDATE B SET ...
COMMIT

Transaction 2:
BEGIN
UPDATE A SET ...
UPDATE B SET ...
COMMIT

I've been using this technique in a variety of environments for more than 15 years. As long as I don't take shortcuts, I never have deadlocks. Hope it helps you.

HughE on August 29, 2008 9:57 AM

Holy shit! These comments (inc. Jeffs original post) display the staggering amount of ignorance there is about SQL Server.

It's truly scary - A good 60-70% of comments are clueless, particularly the ones just saying use NOLOCK. Oh dear. Makes me realize how lucky I am to have a good understanding of the product. Conversely it makes me realise why so many database applications are fcked!

If you actually understand how to program SQL Server its one of the best performing DBMS in the world, not to mention to easiest to develop for.

JackU on August 30, 2008 8:40 AM

I like Dave and his writings. But, for the life of me I can not understand why anyone would want to run Windows SQL Server when they can just as easily Install and Run MySQL.

MySQL can run on a Windows Server. I am not sure if .NET/ASP.NET has API's for MySQL, but if it does, switching to a better DataBase is what I would recommend.

I run 4 sites that have massive reads and writes and Im running a very old MySQL version (3) and have never had any problems. Other than when the server crashes, for whatever reason, and even than all I have to do is run the repair tool to reconstruct any broke record that may have corrupted.

Matt K. on August 31, 2008 5:22 AM

There is a fundamental difference between database dead-lock and 'the Dining Philosophers puzzle' dead-lock you would see in concurrent programming using 'lock'.

The deadlock described is 'Philosopher' is a state at which none of process can continue because circular dependency. ( that is until you force one to give up ). This happens when a process A has lock on X and requires lock on Y and a process B has lock on Y and wants to lock on X ) There is an inherit resource inter-dependency conflict.

In the database, 'dead-lock' can happen when a rough process can lock the table - there is no lock on two resources. 'select * from table' may look innocuous but if it does not release lock on the table, any subsequent 'update' statement may cause 'dead-lock'. It really depends on the database to determine which statements to terminate. In DB2 case, it was 'update' statement but in MS-SQL it may be 'select *'.

'select *' is not innocent. If your program holds to the result set and iterate over very 'slowly', other operation that require 'write' lock maybe be dead-lock with 'select *'. It is just resource contention and no circular dependency. This problem was very self-evident on my case with DB2. The developer was doing 'select *' from very large table using DBArtisan. DBArtisan continues to hold on to the result set thus the table is locked out, causing any subsequent update to fail.

Detecting 'dead-lock' is _very_ difficult. Your first instinct ( 'SELECT *' cannot cause this ) may lead you to wrong path. I hope you find your cause soon.

Good luck.

Michael Lee on August 31, 2008 10:28 AM

Based on the philosopher example, simple switching is all you need

http://www.dragonlasers.com
Sod

Sod on September 1, 2008 2:17 AM

Jeff's point in one of his followup posts is the critical one: the DBMS should, at user's option, be able to handle deadlock automatically, without user intervention. We are willing to accept some epsilon-level error on one side (dirty read, error in time synchronization, whatever) in exchange for not having our query barf on us for reasons created by the specific circumstances, i.e., someone else is competing with us for scarce resources.

In situations where code must be of very high reliability (larger organizations like mine), deadlock failures are simply unacceptable, period. The fact that they can arise through no fault of one's own query is a huge Achilles' heel to basic SQL.

The other Achilles heel, btw, is the ability of a single client with a bad query to bring the server to its knees, also a simply 100% unacceptable event. I imagine both of these server weaknesses can be tuned out by a skilled DBA, but I certainly don't know how, and we see these kinds of failures all the time.

John Pirie on September 1, 2008 7:42 AM

Just a comment on deadlocks and the with (nolock) option - That basically says that you don't care what state the data is at. That is blocked by default because it is dangerous. Information that is dirty might be about to be undone. You can't seriously say that this doesn't matter because it isn't a banking app, because ALL information that is being read from the database is being read for a reason. Information is acted on, either by a computer or by a user (even if that just means displaying to a user so they can make decisions). Even somethign as simple as a date of a post: Oh - I won't respond to that post because it was made last year. or That concept is outdated... no - hang on, they just posted today.

Effectively, you are allowing decisions to be made based on information that will be wrong some of the time. That doesn't make sense. Seriously - why are we bringing back information from the database if no decisions are based on that information? Useless information would never need to be retreived.

SOMETIMES I will use with (nolock) - especially when I know the information I am returning can't be changed (one part of a row may be static, and another part may change). Also - if the decision being made by the information is allowed to be wrong some of the time (such as general monitoring processes), then I will also use it.

BUT - I agree with others that it is how you are trying to acheive your result that is the issue. A deadlock is a bit more complicated than 1 process trying to get access to a resource while another already has it. A deadlock is 2 processes trying to get to eachother's resource at the same time, but neither will let go of their resources until they complete. So your problem isn't with the Select, your problem will be that you have TWO updates. SQL Server only killed one of them, which is the one you had reported to you. Seriously - do a profile to determine what is running at the time the deadlock occured, and it will be clear the processes that are clashing (I use Lumigent Log Explorer, and it makes it very clear which resources are in conflict).

As for the select:

SELECT *
FROM [Posts]
WHERE [ParentId] = @p0

If you had not used an asterisk (which is a BIG NO NO) AND indexed the information you were after, you may have found that no block due to the lock would have occured at all. That is, SQL Server 2005 on allows you to include data at the bottom page of the index. Once the index has been used to find the data, it can then return the data to you without reading from the actual table!!! If this data is not being edited, the index should not be locked. So that performing this query:

SELECT [UserName], [Topic]
FROM [Posts]
WHERE [ParentId] = @p0

Stting the ParentID's Index to have username and topic columns included with the index (look up the TSQL on this to see how this is created). This would locate your information FROM THE INDEX and never need to touch the data, and never need to even know the row was locked (take that MYSQL).


As for database technology (others have commented on this):


I have done speed comparisons, and for small apps I agree that MySQL is OK. But for larger apps Microsoft SQL Server out-strips MySQL like a Car vs a scooter. Then there is the power of Microsoft SQL Server. Sure MySQL is getting better, but there is nowhere near the programming ability. So it is still horses for courses - and as long as the SQL is iso standard, a user can still upgrade from MySQL to SQL Server. However, my personal thoughts are that using SQL Server throughout is a prefered option due to scalability, power and it is the tool our developers are most familiar with..

Philip on September 1, 2008 7:57 AM

Using parameterised SQL statements (as in the shown code) can cause SQL to not use indexes, thus causing table scans, which can cause table locks.

On analysis, I have never found this to be true. At least not since SQL Server 2000.... But this is a very Key issue - you always need to profile your system. How things work depend on your hardware, your data, your data structures and indexes.

Philip on September 2, 2008 6:06 AM

«Back | More comments»

The comments to this entry are closed.