The Hot/Crazy Solid State Drive Scale

May 2, 2011

As an early advocate of solid state hard drives …

… I feel ethically and morally obligated to let you in on a dirty little secret I've discovered in the last two years of full time SSD ownership. Solid state hard drives fail. A lot. And not just any fail. I'm talking about catastrophic, oh-my-God-what-just-happened-to-all-my-data instant gigafail. It's not pretty.

I bought a set of three Crucial 128 GB SSDs in October 2009 for the original two members of the Stack Overflow team plus myself. As of last month, two out of three of those had failed. And just the other day I was chatting with Joel on the podcast (yep, it's back), and he casually mentioned to me that the Intel SSD in his Thinkpad, which was purchased roughly around the same time as ours, had also failed.

Portman Wills, friend of the company and generally awesome guy, has a far scarier tale to tell. He got infected with the SSD religion based on my original 2009 blog post, and he went all in. He purchased eight SSDs over the last two years … and all of them failed. The tale of the tape is frankly a little terrifying:

  • Super Talent 32 GB SSD, failed after 137 days
  • OCZ Vertex 1 250 GB SSD, failed after 512 days
  • G.Skill 64 GB SSD, failed after 251 days
  • G.Skill 64 GB SSD, failed after 276 days
  • Crucial 64 GB SSD, failed after 350 days
  • OCZ Agility 60 GB SSD, failed after 72 days
  • Intel X25-M 80 GB SSD, failed after 15 days
  • Intel X25-M 80 GB SSD, failed after 206 days

You might think after this I'd be swearing off SSDs as unstable, unreliable technology. Particularly since I am the world's foremost expert on backups.

Well, you'd be wrong. I just went out and bought myself a hot new OCZ Vertex 3 SSD, the clear winner of the latest generation of SSDs to arrive this year. Storage Review calls it the fastest SATA SSD we've seen.

Beta firmware or not though, the Vertex 3 is a scorcher. We'll get into the details later in the review, but our numbers show it as clearly the fastest SATA SSD to hit our bench.

ocz-vertex-3

While that shouldn't be entirely surprising, it's not just faster like, "Woo, it edged out the prior generation SF-1200 SSDs, yeah!" It's faster like, "Holy @&#% that's fast," boasting 69% faster results in some of our real-world tests.

Solid state hard drives are so freaking amazing performance wise, and the experience you will have with them is so transformative, that I don't even care if they fail every 12 months on average! I can't imagine using a computer without a SSD any more; it'd be like going back to dial-up internet or 13" CRTs or single button mice. Over my dead body, man!

It may seem irrational, but … well, I believe the phenomenon was explained best on the television show How I Met Your Mother by Barney Stinson, a character played brilliantly by geek favorite Neil Patrick Harris:

Barney: There's no way she's above the line on the 'hot/crazy' scale.

Ted: She's not even on the 'hot/crazy' scale; she's just hot.

Robin: Wait, 'hot/crazy' scale?

Barney: Let me illustrate!

The-hot-crazy-scale1

Barney: A girl is allowed to be crazy as long as she is equally hot. Thus, if she's this crazy, she has to be this hot. You want the girl to be above this line. Also known as the 'Vickie Mendoza Diagonal'. This girl I dated. She played jump rope with that line. She'd shave her head, then lose 10 pounds. She'd stab me with a fork, then get a boob job. [pause] I should give her a call.

Thing is, SSDs are so scorching hot that I'm willing to put up with their craziness. Consider that just in the last two years, their performance has doubled. Doubled! And the latest, fastest SSDs can even saturate existing SATA interfaces; they need brand new 6 Gbps interfaces to fully strut their stuff. No CPU or memory upgrade can come close to touching that kind of real world performance increase.

Just make sure you have a good backup plan if you're running on a SSD. I do hope they iron out the reliability kinks in the next 2 generations … but I've spent the last two months checking out the hot/crazy solid state drive scale in excruciating detail, and trust me, you want one of these new Vertex 3 SSDs right now.

Posted by Jeff Atwood
126 Comments

I use my SSD fully expecting it to fail. Just like I date crazy girls fully expecting them to stab me: Always have that backup plan!

My SSD simply holds my OS and apps, while my big mechanical drive (anything Western Digital Caviar Black) holds my docs. The projects that require serious I/O (like my localhost and 20 drupal sites) run off the the SSD but are, of course, version controlled through Git. That drive could nuke before I'm done with this post and I would be just fi

Bloomcb on May 2, 2011 2:36 AM

Hey thanks Jeff for posting..
I have been thinking about SSD's for a while now, and wondering what their reliability is like.. given what you said, it certainly sounds fine to put up with the hassles for the enhanced speed. Considering most of what I do these days is either backed up locally via NAS, and Dropbox, and GIT for dev work, it really isn't like the bad old days (I remember buying an Amstrad PC-1512 - ok, I am originally from the UK, and that was the first real PC I could afford back then - it had a WD 10mb HD that fitted into a card slot.. wow that thing failed..).

So, my rather modest AMD based laptop should squeal with delight once I get an SSD for it... happy hardware is good hardware!

Let the madness begin!

David Sheardown on May 2, 2011 2:42 AM

Do you know why they are failing? Have you had any problems getting free replacements? Have the manufacturers said why the reliability is so bad?

It might be costly, but would you consider buying 2 and using RAID 1, so that if 1 fails, the other one will still be working?

Paul on May 2, 2011 2:42 AM

Let's do the math:

Average life of SSD = 227.375 days (based on Wills' data)
Price of recommended SSD = $524.99
--
SSD tax = $2.31/day, ~$70/month, ~$843/year.

Can Berk Güder on May 2, 2011 2:45 AM

So really, SSDs are just the Netflix Chaos Monkey of your personal backup strategy :)

Troy Hunt on May 2, 2011 2:52 AM

A bit more math (wish I could edit the previous comment):

4 TB RAID 01 or 10 setup using WD Caviar Blacks = $640
1.2 TB RAID 01 or 10 setup using WD VelociRaptors = $1,000

Obviously this is only feasible for a desktop, but for $115+ more you get:

  1. 5 to 16 times the capacity
  2. Years, not months, of life
  3. Instant and automatic backups
  4. Comparable speed for most practical purposes
Can Berk Güder on May 2, 2011 3:07 AM

@Can Berk Güder, comparable speed? Sequential, sure, but nobody cares about that on SSDs except marketing weasels. What we care about is random 4K, which the Vertex can do 250MB/s incompressible, while HDDs struggle to do 1MB/s each (the high-density - 3TB - 7200 3.5" ones, smaller/older are far worse).

Oh and... you stick those 3.5" drives into a notebook. You could stick two VelociRaptors, but notebooks don't supply 12V to the drives, so you don't give them power.

Mircea Chirea on May 2, 2011 3:23 AM

@Can Berk Güder, anyone saying that SSDs and HDs offer comparable performance just hasn't used an SSD.

Poromenos on May 2, 2011 3:28 AM

@Can Berk Güder, pardon me if I misunderstood, but what makes the Black array an instant and automatic backup, especially compared to the SSD array?

Apps 55753818692 823210138 54f697b0f292c60aa6e0030f691186d2 on May 2, 2011 3:43 AM

@Mircea and @Poromenos

Honestly, I've used neither (SSDs or a RAID setup). I'm not denying SSDs are fast, nor am I trying to bash them. I've also pointed out that this setup wasn't feasible for a laptop.

The point I'm trying to make is, I think Jeff is looking at this from a very strange angle. When I look at the data, I say "SSDs are blazing fast but they fail like crazy." Jeff says "they fail like crazy, but I still love them, so let's buy more SSDs."

I've been keeping an eye on SSD prices (which are much higher here than the US) for a long time now, but I think I'll wait a bit longer after reading this post. I just can't afford to replace a $1k drive every 8 months.

Can Berk Güder on May 2, 2011 3:54 AM

@Apps 5575 RAID mirroring (RAID 1). who mentioned an SSD array?

Can Berk Güder on May 2, 2011 3:57 AM

I am curios. If SDDs fail so often, why do manufactures list MTBF and warranty similar to HDDs?

From OCZ Vertex 3 240GB:

MTBF - 2 million hours
3 year warranty

Alexandru Moșoi on May 2, 2011 4:07 AM

I agree with your comments, I fitted out a 6 year old ThinkPad X60 with a Vertex 2 last month and it now feels faster than my Quad Core i5 desktop w/ 6Gb of RAM. *Sigh*.

I wanted to ask if anyone knows much about the SMART monitoring on SSDs, and whether it can predict these failures. If they are failing due to running out of "good" flash to spare, then presumably such a prediction should be really accurate - a monitoring program should be able to watch SMART prefailure stats and pop up "Buy a new SSD in the next 6 weeks, or your data is toast!". If they're failing for some other reason then I guess all bets are off, though.

Angus on May 2, 2011 4:20 AM

@Can Berk Güder, RAID is for high availability and/or performance, it's not a backup. High availability allows your server to continue servicing while one of its disk is dead. It does not protects you from "oops, I deleted all my work!". Backup should.

FooBarTwit on May 2, 2011 4:23 AM

"it'd be like going back to dial-up internet or 13" CRTs or single button mice. Over my dead body, man!"

http://whitewhine.com

Just sayin'

Slavo on May 2, 2011 4:24 AM

Shouldn't all of those drives have been under warranty when they failed?

I have a 64 GB Patriot SSD that's three years old and still going strong. It came with a ten year warranty which seems pretty incredible. I wonder what their replacement strategy is in nine years.

Anyway, now I am paranoid and off to double check my backups.

William Furr on May 2, 2011 4:48 AM

I'd be willing to bet the listed failure rates, while higher than platter HDDs, are not as high as this sample set would lead us to believe. I have two SSDs (Supertalent and Intel) with a total running time about 1150 days with no problems. That's not to say they won't fail tomorrow but I keep my backups upto date so I am not really afraid of the possibility.

J McCaw on May 2, 2011 5:01 AM

@FooBarTwit I assumed what I meant by "backup" would be pretty obvious from the context.

Can Berk Güder on May 2, 2011 5:02 AM

The MTBF of any drive is clearly not 15 days. That's plainly just a warranty issue.

Chris on May 2, 2011 5:04 AM

I can confirm your experiences; I built a PC for my mother and couldn't understand what was wrong when it wouldn't start. Windows wouldn't boot, and neither would any of the safe modes. Pure dead.

I was stumped because I thought "It's only been in here less than a year," it can't be the SSD. The worst part, as you mentioned, is that it was a total catastrophic failure like I've never seen with HDDs. I managed to recover the data using boot tools, but couldn't write to the drive, perform certain diagnostics, format or erase partitions. It was disastrous.

OCZ replaced it without hesitation (Vertex 40GB), but it taught me a lesson about the _real_ reliability of SSDs. Naturally, I have still used SSDs in my last two laptops!

PS: The OWC 6gb/s SSDs are supposed to be the fastest, most reliable SSDs around I've heard.

David Doran on May 2, 2011 5:09 AM

My AData S599 failed in less than 6 months. It's all your fault ;) Like you, I would never go back to a mechanical drive, but I'd like a bit more than 6 months...

Kearon.blogspot.com on May 2, 2011 5:14 AM

Odd, the particular brand i'm using leaves a 3 year warranty and a life expectancy for a million hours. How is that even close to a good deal for the manufacturer?

Backup? Naaaah, but sourcesafed and nothing you can't reinstall on the system drive.

Crypth on May 2, 2011 5:24 AM

Is it just that the reliability of the new hotness isn't there yet? Is last years slower model more reliable, or can you pay more for more reliability?

Joel Hess on May 2, 2011 5:25 AM

All this fuss - the future is here already and it isnt the current type of SSD. Keep your data & apps away from your computer whatever form your computer is and let your ISP & cloud provider have the hassle (check that you have a decent service level agreement)
I know speed and network availability is occasionally an issue at the moment but that will improve.
I can see a few situations where SSD are ideal at the moment - some Formula 1 teams use them as the vibrations from the car engines break hard drives in the pit team's laptops(scary!)
I've still got conventional drives in my PCs, laptops and the Thecus SAN that's been going strong for 3 years.

Dcbcherrygate on May 2, 2011 5:40 AM

I've used SSDs. I know what they're like, and for laptops I would use nothing else on account of the greater physical robustness.

But I can't agree with the "Crazy/Hot" scale. The fact of the matter is, outside a few corner cases, SSDs for desktop storage don't provide any particularly useful performance advantage. OK, your system boots faster. This will add up to entire minutes saved up over an entire year!

I will lose more time to having to replace faulty disks, restore from backup, and redo work lost since the prior backup, than an SSD will ever save. The SSD will save a few second a day. The failure will cause downtime of several days. That's simply not a good trade-off.

I mean, sure, if I were compiling Linux kernels or Chrome all day, every day, then yes, I would probably get a net time win from using the SSD. But I'm not; nor are most people. Continuous high levels of random I/O just aren't a common workload.

SSDs as a cache or scratch area are, however, another story. If the SSD fails, you just lose the cache--a performance issue, not a correctness issue. That's a much more favourable outcome.

Drpizza on May 2, 2011 5:56 AM

I've heard that when an SSD fails it essentially becomes read-only. Have you found this to be the case, Jeff?

Michael Dudley on May 2, 2011 6:20 AM

Just for the record, my Dell M1330 w/SSD (same one Jeff has/had) is still rocking after 3 1/2 years (nVidia graphics was replaced under warranty). Mostly used now for video recording, prior to the was development and presentations.

I also have an Intel 160 GB X25M Gen2 drive in my desktop, and it's just over a year old. Mostly used for development and video editing / production.

Neither of these were the fastest of the moment drives, but they put up good random access numbers in tests (well beating physical drives) and stood out in stress tests. I also made sure I understood how to install Windows correctly to take full advantage of the drives (i.e. TRIM and such).

I could just be lucky, but I think when you are going to be an early adopter it pays to stay off the full bleeding edge and not chase one stat (speed) at the cost of another (reliability).

Oh, the Dell duties were replaced by a 13" Mac Book Air... with SSD =)

Michael Neel on May 2, 2011 6:52 AM

I've got the 160GB X25-M (Gen2) in my M1530 for nearly every type of use case, and it's still going strong (almost 1 year old, knock on wood).

What is the exact usage case scenario's you guys are working with that destroys these SSD's? It doesn't make any sense.

This article seems to cultivate paranoia now when deciding on getting an SSD.

Sebastian on May 2, 2011 6:59 AM

100% agree, SSD is a 'hot' technology. And my 3 years old Dell Latitude E4300 with its 128GB never had any drive problem, while I am using it more than 50 hours a week since.

Patrick Smacchia on May 2, 2011 7:07 AM

Wow your friend is either super unlucky or must have some enviromental issue(heat or power related).

I have had no issues with my 160GB intel gen2(dont remember how old, but about 15TB written so far according to SMART indicator). Worried that I might be getting close to the write limit i recently purchased anther 120GB ssd(also intel). No problems there either. Maybe its because I live in a rather cold timezone :)

Intel has also published annual failure rates on roughly 800.000 sold SSDs, and from memory I seem to recall it was 0.4x%. For now I'm inclined to take them at their word, at least no one I know personally have had any SSD failures yet(knock on wood).

Btb on May 2, 2011 7:13 AM

Me, I'm tired of 'promise' followed by failure. Give me a good ole slow drive that's going to make it twenty years please. Life is complicated enough without have to constantly swap out hardware...

Paul.

Paul W. Homer on May 2, 2011 7:34 AM

Good to know I'm not the only one being paranoid :-)

MBP was out of warranty so I replaced the DVD with an SSD. Best of both worlds. User data on HD. OS & frequently used apps on SSD (with nightly backups to HD). Wicked fast. Loving those boot times.

However... the SSD has crashed *four* times in the last six months, once so badly it bricked and had to be replaced. Boy, you nailed it with crazy/hot.

I'd like to upgrade to a new MBP but don't want to give up the dual drive thing without having to break the warranty. Laptop makers really should make the built-in DVD optional.

Ramin Firoozye on May 2, 2011 7:37 AM

I've had my 80GB Intel X25-M for 10 months now and it hasn't failed yet. But I've run out of space. I was using it only for the OS and programs. All my documents/pictures/video were on a 1 TB HDD. Well, the Windows folder is more massive than ever at 22GB and the two Program Files folders add up to 25GB. So just yesterday I replaced it with a 160GB Intel 320 series. I'm hoping it'll be a dependable drive.

The funny thing about SSD reliability is that it would hardly be a concern if manufacturers only sold SLC drives. With SLC, 1 bit of data is reserved for each cell. But everyone is buying the MLC drives because they are at least 5X cheaper. MLC stores multiple bits per cell, and as such it can be more difficult (less reliable) to get a positive fix on a value. So much so, that in fact an SLC drive is expected to live around 10X longer than an MLC drive.

It's a couple years old, but there is an excellent article on Tom's Guide covering some of these finer points...
http://www.tomsguide.com/us/ssd-value-performance,review-1455.html

They also talk about wear leveling algorithms and how a drive's lifetime increases with capacity. If all other things were equal, my 160 GB drive should have roughly double the lifetime of my 80 GB drive that preceded it.

Steve Wortham on May 2, 2011 7:48 AM

Seconding Paul W. Homer.

Yeah, I used to date crazy girls. Then I grew up and married somebody sensible. If that makes me an old guy, then just mix me up my laudanum and wheel me into the corner where I won't be any trouble.

Aaron Em on May 2, 2011 8:24 AM

Despite failures, if your productivity compensates, its worth...
got my intel 320 ssd ( betting on stability) planning to buy veterx ocz 3 for my next pc :)

Amitabh Mritunjai on May 2, 2011 9:05 AM

I have one copy-editing quibble with your article:

Just make sure you have a good backup plan if you're running on a SSD.

This sentence is too long by six words. :-)

One should ALWAYS have a good backup plan. I don't care if you've got a storage system that somehow magically carves the bits into granite with a frickin' laser. Have a backup plan.

Actually, ideally, have two!

Uncle Mikey on May 2, 2011 9:14 AM

What's the deal with all these SATA drives? Am I the only one using a PCIe version? No need to worry about sandy bridge or 6gb/s... only drawback so far is that it doesn't have trim support (due to it using raid 0):
OCZ RevoDrive PCIe SSD 120 GB

Meg Noz on May 2, 2011 9:21 AM

I am going to be building a new machine and was considering using a SSD for the OS. Now I want one even more, and am also more scared about it. Maybe if I keep all my files on a second regular HDD I'll be fine. +1 for the Barney quote, I love that show.

Jisaacks on May 2, 2011 9:33 AM

Crypth: MTBF is not not NOT "expected lifetime".

The three year warranty is there because they expect it to last three years, not "a million hours" (114 years).

MTBF/MTTF is, at best a statistical measure of how long you should expect to go between failures if you have a large mass of drives operating.

http://db.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html - MTBF is, really, a damned lie when it comes to computer storage.

Sigivald on May 2, 2011 9:37 AM

Let's say it takes 4 hours to order&buy the replacement, and 4 more hours to bring the system completely up-to-date. Let's also say you earn $52000/year (~twice more than average US worker AFAIK). So you lose 8 hours of pure time and 520(price of recommended SSD)/(52000/(365*24))= 87 hours worth of money; It'll be profitable if it saves you more than 95 hours over its lifetime of 227 days.
So that SSD is only profitable if it saves you more than half an hour each day...

Sinthesis.wordpress.com on May 2, 2011 9:52 AM

I've been using an Intel X25-M SSD for 3 years and it "failed" 3 times.

Every time I've been able to make it usable again with the same procedure:

- Boot under a Linux live CD that includes a NTFS driver (for example, ubuntu's "try ubuntu" mode)
- Mount the SSD in read-only mode (mount -t ntfs -o ro...)
- Copy the disk's data to another disk. This will recover most data, except files that were being updated when the disk failed
- Use "ATA Secure Erase" to completely erase the SSD: https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase
- Repartition and reformat the SSD
- Restore the disk from a backup.
- Copy data that was recovered while the disk was mounted in read-only mode (to partially recover data that's more recent than the last backup)

I'm not sure if that would work with other cases of SSD failures, but it could be a solution instead of buying a new SSD every time. In any case, you would still need a good backup strategy and use a non-SSD for documents (except those that really need fast IO).

Christos on May 2, 2011 10:09 AM

Has someone noticed oddly that almost all the comments here are reporting there SSD's have laster over a year? Jeff can you mention what was the usage trend of these SSD's? I have had a G1 SSD X-25m 160GB, among the first from the hot batch which I got from a intel guy to test drive there new babies. It has not failed in the last 2 or so years...

Itissid on May 2, 2011 10:11 AM

Jeff,
Would you please provide more details about these failures? You say:

"And not just any fail. I'm talking about catastrophic, oh-my-God-what-just-happened-to-all-my-data instant gigafail. It's not pretty."
Does this mean no data readable on any sector of any of the failed SSDs? Or was it recoverable when attached to another machine (such as outlined in Christos' comment earlier)?

Cheers,

J-b on May 2, 2011 10:31 AM

Several years back I got a TI-83 plus, and the manual contained an warning that it is a bad idea to repeatedly archive and unarchive things from the flash memory because doing so could damage it. Ever since then I haven't trusted SSDs and I still think of them as far less reliable than a rotating hard drive. Thanks for helping confirm superstition for the 50th time.
As a side note, it wasn't until about a year or two ago that I finally accepted that the same technology that powered all of the painfully slow USB2 flash drives that I have used over the years could ever be as fast as was a HD (even the 5.4k ones).

GameFreak4321 on May 2, 2011 11:23 AM

I've been using a 160 GB Intel X-25M Gen 2 for 16 months or so in my laptop, for work and personal use, and it's still going strong. My OS and apps are on there, and then I have a pair of striped disks for other stuff. I have a Windows home server that does a decent job backing it all up...without that I'd have mirrored the other disks.

The SSD drives are expensive, granted, and I can see why somebody might try to justify the time savings...but to me that's not the point. I don't mind spending on things that matter to me, and taking the hit on other things. Given that I spend 8-10 hours a day on my laptop, the perception of speed makes me happier. It's a quality of life issue.

Mike on May 2, 2011 11:27 AM

I applaud your courage (deep pockets), but this technology is still immature. I'm a poor college student and as such, do not have the revenue to continually buy a new, extremely expensive piece of hardware.

I was on a single core laptop for like three years before I finally brought a custom desktop (3 core AMD Athlon II/4gigs DDR3 Memory/500Gig WD HDD). While my projects don't call for a stronger machine now (not that I could afford it anyway), I'd rather wait for the technology to cheap and get a bit more mature.

Really, it's a matter of convenience vs. stability, and which one is more fitting for you. SSDs are not fit for everyone right now, because of the issues highlighted in the comments/post.

I do, however, await the day they become stable technology!

Coolelemental on May 2, 2011 11:35 AM

Surely there's a 2-3 year guarantee on those discs. If they fail every 8 months it must cost the producers an arm and a leg to replace them all. You have to provide 4 SSD's for the price of one.

Carrandas.blogspot.com on May 2, 2011 11:40 AM

I know I am tempting fate here but I have had three computers with SSDs for more or less the same period and I only had one failure (one SSD died after 24 hours of use but that also happens with other components). Two of them are in laptops: a Samsung second gen. with 60 Gb and an OCZ Vertex 2 120. In my desktop I have two first generation Samsungs with 60Gb each connected in RAID 0 (yes, ZERO not ONE :-).

But then I also have daily backups (just in case). I may be the odd one out but, as always, one only hears about the failures, not about the systems that keep working just fine.

Jose Simas on May 2, 2011 12:11 PM

I think Jeff just has a case of "too much money". Most home users don't have the money, or most certainly don't want to spend, $500 (OCZ Vertex 3 240 GB, newegg) every 6-12 months because a drive fails. It's not about the backups, because a 500 GB spinning platter for backup is cheap, and you should have this anyway. Sure if you have tons of money, or your employer is paying for the SSD, then go right ahead. But failing this often is just not an option. $500 would be 1/3 to 1/2 the price of a complete new system. I don't want that failing out every year.

Kibbee on May 2, 2011 12:40 PM

Jeff,

That is exactly what I'm seeing as well. As an early adopter of SSD's I've found the consumer grade stuff dies in non-fun ways. The total failure of the drive leads me to believe it is a manufacturing problem not a problem with the NAND flash chips themselves.

If you look at the specs most of them are rated at 10^15 on uncorrectable errors, same as their HDD brethren. What most companies won't tell you is the hard failure rate in the field. This alone is why I warn people away from putting consumer grade SSD's in their servers.

I blog about storage at http://www.sqlserverio.com and cover SSD stuff quite a bit.

WesBrownSQL on May 2, 2011 1:17 PM

Jeff, the company where I work has ordered SSDs for everyone. We can't wait for them to arrive!

SSD is a no-brainer for me. One commentator said, it will save you a minute when you boot, so what - he must not understand what it's like to build a large enterprise application consisting of tens of thousands of files, when the disk I/O is a productivity-killing bottleneck. If a 30-minute build becomes a 5-minute build, we save 25 minutes per build, at a rate of X dollars per hour, and we do Y builds until the SSD dies, equals to Z dollars and there's our return on investment. Faster builds also mean more frequent builds and integrations, which should lead to higher quality of code. We need to put a dollar value on those improvements (fewer tech support calls, etc.) and include it in our ROI calculation.

I definitely see the ROI, so -- I don't think the hot-crazy "How I Met Your Mother" analogy is really necessary :-)

Alexei Zheglov on May 2, 2011 2:44 PM

Over at blekko, we've had 3 SSD failures after 1.5 years, out of 700 drives. These are Intel X-25M 160G2 drives.

Greg Lindahl on May 2, 2011 2:56 PM

I was just looking at moving to SSDs last night. I'll probably wait until the new Intel drives come out later this year. However, I have a serious question.

Right now I run two drives in RAID 1, and I do full system image backups every couple of days to an e-go via USB. My question is, does it make sense to run RAID 1 with SSDs? My concern was that I would just be wearing down both drives just as quickly, without any striping benefit. However, if what you say is true about frequent SSD failure, then RAID 1 actually appears very smart. Thoughts?

Also, when these SSD drives fail, what do you do with them? Return them or chuck them out, or do you reformat them and wait until the next falure?

Bruce Woodcock on May 2, 2011 3:06 PM

Jeff
*I am scared*

#1
I am checking my SSD health status with http://ssd-life.com/ once a week.
Currently it says: Estimated lifetime: 26years, 11months 25days.

Do you see any value in tools like those?
Do they actually work? Or is my Friday afternoon paranoia check useless?


#2
Why do SSD drives fail? Do they all fail for the same reason?

Thanks
Interesting view on SSD drives and nice blog post!

Peter Gfader on May 2, 2011 4:21 PM

From the perspective of someone who recently downloaded and began to read "Our Choice" book from Al Gore, your SSD strategy isn't quite pointing the right direction, resources wise.
The fact that I'm the first to comment form this perspective is also worrisome... but let's be optimistic. People of the Nerdery, please consider your impact on the environment when getting new equipment and take a step in the right direction: favor stuff that will last over stuff that will save you 10 minutes per day. We live in a spaceship, supplies are limited, the equation of your choice needs to consider this too. Thank you.

Thibaut on May 2, 2011 4:42 PM

I've had my SSD (Intel X25M) in my desktop for exactly two years with no failure; the machine is powered on (full, e.g. no sleep/hibernate) 24/7.

The first thing I did in Windows was move the page file to a mechanical disk, along with the browser caches. The last thing you want is lots of small writes to the SSD. I only have the O/S and apps on the SSD. Data is on the RAID1 mechanical Velociraptors (of which, one has failed, and are the same exact age as the SSD). My experience with WD's raptor/velociraptor drives has been very poor which is why I RAID'd them in the first place. These mechanical drives are hardly used at all other than for the page file and browser caches. All my data sits on a 4TB RAID5 NAS with WD Caviar Green disks, which has had zero failures and is also powered on 24/7.

Robert Barth on May 2, 2011 5:22 PM

If you look closely at the warranties of SSDs, you will notice that they are shorter than their traditional Rotational counterparts. Flash has well known issues such as write speed degradation with time, and a very limited lifetime for every cell.
I think one of the wiser approaches is to use flash as a non-volatile cache, so that when your flash crashes, so you can have the peace of mind that your data is safe in a traditional hd while enjoying the speeds of flash.

Hyz on May 2, 2011 6:03 PM

I think the problem is Jeff is speaking from a Professional view. For average joe, they simply dont have that much money. ( Of coz they have warranty, but in some places where mail order are not available, traveling to repair center would properly cost 40% of your SSD )

And people just dont appreciate how much we have advance for HDD in terms of mechanical perspective.

That is why i think the latest Intel Turbo Cache on Z68 will finally make a difference. It requires a minimum of 20GB drive, so the features will actually guarantee to speed up your performance. Previous generation allows you to use as little as 2GB which isn't helping at all.

If you think about it. Most of your frequently use files, minus your multimedia files aren't actually that much. Your frequent Windows 7 could fit within 3GB space. ( You dont need Help files, Drivers, backup files... etc ). Those will continue to live on HDD. With RAM being very cheap, putting 8GB of memory in, you could even disable Pagefile. I wouldn't recommend disabling it with less the 8GB just to be safe. And to all those who argue that Microsoft recommends you to have pagefile on, Microsoft actually have Pagefile OFF by default on their Windows Embedded PC version and some other versions.

The rumors point to Intel selling this "Larson Creek" 20GB SLC Cache for only $5x. This should give you 90% of SSD performance for relatively cheap price. And You dont have to worry about your SSD dying because all of your Data will still be intact on your HDD.

Edward Chick on May 2, 2011 9:57 PM

I recently became addicted to the speed of SSD's when I put a new Vertex 3 in my 2011 MacBook Pro. This thing is absolutely amazing and I was willing to sacrifice HD size (I could only find a 120G) in return for speed.

My computer boots in less than 11 seconds and opening applications like Photoshop CS5 and Illustrator is almost instant. Now I'm a little worried about the imminent failure of my SSD but I keep good backups and by the time it fails there will probably be something faster available.

Chris Olbekson on May 2, 2011 10:13 PM

Well isn't SSD FAST? You shave here and there seconds at some applications start (first start that is if your short on RAM and I guess you're not). And then you stay off the computer for a full day at best to replace the drive, not cosindering other inconveniences like shipping the bad drive for replacement if under warranty, restoring from backup which is of course complete and taken the minute before your drive vailed. Looks really ugly to me and I write this using a SSD one year old that didn't fail yet. It's time for a new backup.

Ovi - on May 3, 2011 1:04 AM

Wellm has anybody thought about ecological impact besides just thinking in $$$? SSDs will have a bigger CO2 footprint, use more rare earth source materials during production. The more of them we throw away, the more you pollute the environment. A harddisk is already bad enough, so lets not replace it by something which is even worse.

Guys, all I read here is ROI etc... we cannot continue like that, I think. Considerations must be made from all points of view, not only from the finanical/economical one.

Flori on May 3, 2011 5:52 AM

Of interest to me is how these failure rates correlate to the operating system on which they are used. Are the failures consistent regardless of OS or is there a culprit?

chadly on May 3, 2011 6:45 AM

You may want to examine how your using the disks. If I remember correctly, flash memory cells have a limited number of write cycles (it used to be 100,000 cycles, but i'm sure its gotten better) , and these cells are usually erased and written in blocks. I'm sure the number of cycles is pretty high, but any number of things that would cause continuous writing to the disk may cause it to die earlier than expected. For example a daily virus scan or backup procedure would cause the last access time and archive flag on all of your files to be updated daily, resulting in a lot of unnecessary write cycles (I believe windows can be configured to not update the last access time). Also, if you have a low memory system or do memory intensive work and the system swap file is located on your SSD, you may be doing continuous writing to the disk that would wear it out.
I guess in short, flash memory is good for fast storage but is simply not meant for continuous small random writing. I would disable last access file time updating (http://windows7themes.net/registry-tweaks-how-to-disable-last-access-filestamp-in-windows-7.html) and possibly even move your swap file to a standard hard drive. Then I bet you will see a much better life span.

c k on May 3, 2011 8:37 AM

What is considered a "generation" for a technology platform?

Jim Fell on May 3, 2011 11:32 AM

For a slightly different viewpoint:

I've had way too much experience with SSDs in the past three and a half or so years. All of it is with high-availability mission-critical embedded devices using surface-mount SSDs with IDE interfaces, quite a different form factor and application from the ones you are probably using. Most of my experience has been bad.

The SSDs fail catastrophically after a few hundred power cycles. That's just enough for them to get through system testing, as we were to find out, then fail hard in the field after a few months of use. This is not an issue of exhausting the write-cycles on the underlying flash. We do very few writes, and all the devices we use feature wear-leveling.

We know that the failure is in the NAND flash part itself. We can remove the flash from a working SSD and place it in a failing SSD and the failing SSD "revives". The SSD apparently has its own firmware in the NAND flash part, since we cannot use off-the-shelf uninitialized NAND flash parts of the same brand and model to the same effect.

Life is not good. I cannot recommend these kinds of SSDs to my clients for these kinds of applications.

(Sorry, NDA keeps me from saying much more.)

Chip Overclock on May 3, 2011 12:05 PM

I should also mention, this is not an issue with file system corruption. We can deal with that. Failure modes include: boot blocks or partition blocks being corrupted (we never write to those in the field), or the device quits responding to commands at all (this is when replacing the flash part revives it).

Chip Overclock on May 3, 2011 12:46 PM

Very odd.
I've had my "34nm 160 GB Intel® X25-M Mainstream SATA Solid State Drive" since 5/18/2010, and it works like a champ. I use the Intel SSD Toolbox as well.

John Simpson on May 3, 2011 1:45 PM

So, it appears, from the article at the bottom of this comment, that the "write limit per block issue" is no longer an issue. I'll search here and there to find a reason why they fail so easily (no, imported, "cheap," manufacturers won't count, as expensive manufacturers have the same problem).

One thing I've considered is that many electronics purchased from circa 2005 have had issues with electrolytic capacitors, usually used in the power circuit of electronic devices. Chinese electrolytic capacitors used in manufacturing of consumer electronics during that period (~2005) have had a short MTF (MEAN TIME to FAILURE) (see Samsung LCD monitors, most have faulty power supply circuits due to these shoddy capacitors).

Article is dated, but has some useful infromation:
http://www.storagesearch.com/ssdmyths-endurance.html

Jerold Haas on May 3, 2011 2:12 PM

That's way too many SSD disk failures to be a coincidence, and this does not jive with statistics gathered from a broader base; I think your drives are being damaged by something external. Have you been plugging all those SSDs into the same power supply, by any chance? Do you run your wall power through a UPS? If SSDs were that unreliable in general, there would be users with thousands of them and failure statistics making noise about it.

Jim Babcock on May 3, 2011 2:41 PM

My computer at work has an SSD from circa Jan 2011, 12GB RAM and a 3.2GHz Quad-Core Xeon. My home desktop has a HDD from 2007, 2GB RAM and a Core 2 Duo.

The home desktop does what I want much faster, much smoother, much more responsively. The difference? Software.

Work machine = Win 7 + Visual Studio.
Home machine = Debian + XFCE + Geany.

HDD is not the main problem.

Nick Watts on May 3, 2011 3:58 PM

Here at the project that I work we have been using HDD's and SSD's for quite some time and we could make a really good real-life test. We can say for sure that: YES, SSD's really pay their price in speed. I've made a post detailing the subject here: http://codemadesimple.wordpress.com/2011/05/03/ssdpower.

Codemadesimple.wordpress.com on May 3, 2011 5:13 PM

I've had 4 SSDs since 2007 (first was an $1100 32GB MTRON, then Vertex, Falcon, C300). No failures, plain ol everday MacBook/Xcode/PS use mostly. Not casting doubt on anyone else's bad luck, just another anecdote.

Stationstops on May 3, 2011 6:41 PM

Largely on Jeff's recommendation, my boss got 10 developers 256 GB SSDs (for about $750 each) ~ 2 years ago. They were blazing fast, but only one survived to see its first birthday. He even gave up sending them back to Crucial; since after the first few were replaced and failed again, it became clear that any replacements with SSDs would inevitably cause yet another unexpected half-day down time for that developer within in the next year, and most likely at least some lost work even with daily drive image backups.

Despite my begging for another SSD, he won't let us touch them now. Some things are just too crazy for some people, no matter how hot they are.

I think a major contributing factor to early failure was the mandatory whole-disk-encryption; but based the nature of our business this is not negotiable. If they could encrypt natively on the drive using a managable key scheme that kept our sysadmins happy, the drives might last longer and we would pay a premium for such drives.

Daniel Olsen on May 3, 2011 7:48 PM

@BloomCB - "That drive could nuke before I'm done with this post and I would be just fi"

I lol'd hard. Srsly, I wish I could think up stuff of that level of clever (maybe it was derivative from the 4chan 'sniper' threads, but yours was still gold because it was totally unexpected).

Lulz aside, it seems it's all about the little frequent writes. And @Daniel Olsen - if you're doing whole disk encryption, you're asking for trouble with SSD. But how can you & sysadmins not find a decent encryption protocol that doesn't use whole-disk? Even TrueCrypt (as one entry-level example) would suffice for 99% of things I can think of: it doesn't 'bleed' unencrypted data at all.

Kratoklastes on May 4, 2011 12:47 AM

I've got a production server running a database on two Intel X-25M (80GB) SSD's. The drives have been working flawlessly for 14000h by now.

Tomasz Jędzierowski on May 4, 2011 3:16 AM

My 300Gb 10,000rpm Velociraptors have died in droves. I have 5 of them, 2 in RAID 1, 2 in RAID 0, and a spare. I have replaced 8 drives over the last 1.5 years. That's right some have been replaced twice!

The MTBF numbers are very misleading.

What I don't understand is how I have had computers that lasted 6-10 years as hand me downs when I'm done with them on cheap consumer drives and all the "server" drives fail after so little time.

Mark Kovalcson on May 4, 2011 7:53 AM

I have shipped literally hundreds of Intel G1 and G2 SSDs to my customers and never had a single in the field failure (save for one drive in a laptop where the drive itself functioned fine but one of the contacts on the SATA connector was actually flaky, probably from vibrational damage from a lot of airplane flights, and one DOA drive). I think you just got unlucky there.

The other brands in question, however, kinda suck.

Andrew Sevrinsky on May 4, 2011 9:46 AM

The problem with SSD failure must be due to vibration and/or heat from use in laptops. I've had two Intel X-25M 80GB Gen 1 drives in my home desktop for a little over two years without failure, and I've had six Gen 2 drives in my work desktop for about a year without failure. At my office we have a couple more workstations with multiple X-25M's, and I think there has only been one instance of failure in the last year.

Msgflaw on May 4, 2011 10:00 AM

I understand that once they approach the 1TB threshhold, the Bank of America pricinple will apply and the will become TOO BIG TO FAIL.

Kurt Merkle on May 4, 2011 11:43 AM

@Meg Noz Exactly. I can't believe that you are the only on to mention pci-e!! I've been waiting all along for a sweet spot and then Revo came along and I am about to give up waiting.

Jack Blackhall on May 4, 2011 1:58 PM

OMG, my macbook had been running for last 365 days on SSD. Time to make backups.

http://geeknizer.com

Taranfx on May 5, 2011 7:29 AM

We are running SSD drives at work. We've had several go out.

The worst part is that when we sent bad drives in on warranty we received replacement drives whereone drive had data from a guy named ScottW along with all his docs, email and VMs that he was using.

If you are using SSDs, you should really consider using BitLocker as well because your fixed drive could get shipped to someone else without any data being cleared.

Steve Hebert on May 5, 2011 3:39 PM

As i showed in my post here, sometimes its not a matter of how much time the SSD will last without fail, but just the additional speed it provides.

Lets for example consider that the project that i work its supposed be delived in 1 year... and we have A LOT of work. If we don't do it on time we will suffer some MAJOR penalties (this translates to: We have to pay a lot of money :D)

If your talking about a win or loose situation (delivering the project in time) every minute that we can save its worth it. And as i explained in the post in our case its cheaper to buy TWO SSD every month than having the developer waiting by the build process to complete.

Time is money... and in our profession (software development) this phrase could not be closer from the truth.

Codemadesimple.wordpress.com on May 7, 2011 4:59 AM

Ck:

The problem with suggestions to move your swap file off the SSD is that users are paying a considerable premium for faster drives so that they get stuff done faster. Putting one of the few things on your computer that it's really actively reading and writing (your swap file is not trashing your SSD because your computer is leaving it well alone) onto a slower drive ruins quite a lot of that advantage. Virtual memory is top of the list of all the things you want on an SSD instead of a hard drive to boost performance.

Stephen Rice on May 8, 2011 7:35 AM

Wierd. I have two computers with SSDs.

My Main Workstation:
Intel X25-M 160GB OS
300GB velociraptor
750GB 7200RPM Western Digital (data)

I install programs that need a real boost on the ssd like visual studio on the SSD. Programs that aren't that much of a hog i install on the velociraptor. Most recently i moved my profiles to the velociraptor. My pagefile has always been on my velociraptor

My Notebook
OCZ Vertex 2

Until about 2 weeks ago this was my only drive because i had a 15in lenovo that would only take 1 drive. I just upgraded to a dell 17 xps and i have both the vertex and the existing hard drive they sent.

I've had the ocz vertex 2 for over a year and i've had the intel x25-m about 2 weeks after the released them (a long time) i've never had an issue with either one. Of course i go crazy on backups. I use a combination of crashplan/acronis/freenas w/ rsync to make sure i never loose data.

That being said the speed increase with ssd is enough that i will never been without one. For those of you who are worried you could use raid 1 thought its going to double your cost.


openid.org/ncage1974 on May 8, 2011 8:13 PM

Here are some large-sample SSD reliability numbers from 2010:
http://www.anandtech.com/show/4202/the-intel-ssd-510-review/3

If you look at the source data (via a french article, http://www.hardware.fr/articles/810-6/taux-pannes-composants.html ) the SSD failure rate is in line with average 1 TB disk drives (aside from the Intel SSDs, which are much more reliable).

Anandtech's reviews of the Intel 510 and 320 series SSDs are interesting because he describes what's built in to the drives to improve reliability. The 320 also by default encrypts the NAND storage, meaning that securely erasing the disk is simply a matter of deleting the encryption key!

Andrew S on May 8, 2011 9:31 PM

[Sobbing]

I bought a OCZ Vertex 2 and it failed in the honeymoon period (first couple of months). I thought I could rely on it for first few months so didn't backup my data. Still struggling to find the bits and pieces...

Hemmmer on May 9, 2011 8:50 AM

DrPizza, you say "The fact of the matter is, outside a few corner cases, SSDs for desktop storage don't provide any particularly useful performance advantage". As someone who used to sell hard drives for a living, and who has a few traditional HDs (Mac, PC, Linux server with RAID) and several SSDs (netbooks and MacBook Air) I can say that this isn't even remotely true. The large majority of what users wait for on their computer is I/O, not CPU, and that the I/O pattern for normal users is almost entirely random seeks (e.g. booting, launching apps, editing documents), for which traditional magnetic media are terrible, and SSDs are outstanding. Yes, it's nice that your OS boots in 15 seconds instead of 2 minutes, but it's far more important that your apps launch that dramatically faster, your documents open and save faster, swapping when you're running many apps is faster, etc. The result, IMO, is that you're vastly more productive during the day. And don't get me started on how much faster compiling software is on an SSD - it's like 4x faster, which really changes how you can work.

I haven't seen (knock on wood) any SSD failures. I keep most important stuff backed on (ZFS RAID set on a server in the basement) but I have to confess that I'd assumed that SSDs were more reliable than magnetic drives (no moving parts). If the data supports this article's contention, I'll be a bit bummed. Perhaps I'd better buy an extended warranty on the laptop while I still can...

Laird Popkin on May 9, 2011 9:18 AM

For those interested, we've blogged about SSD Lifetime on the Super User Blog:

http://blog.superuser.com/2011/05/10/maximizing-the-lifetime-of-your-ssd/

You don't want your SSD to fail, do you?

TomWij on May 10, 2011 8:10 AM

I will never willingly "rent" a hard drive for a few months, I don't care how hot it is

Kevin Laity on May 11, 2011 7:53 AM

So, following this discussion, I'd be doomed if I buy the new Macbook Air and don't intend to buy a new one in the next 400 days or so?

Leandroico on May 12, 2011 7:01 AM

After reading this thread I can't help but wonder what operating systems were being used with the units that failed. I realize that it's primarily a hardware problem but I wonder if the choice of OS exacerbates the problem. It may be of some value to survey the failures by OS. It's possible that one/some OS will have a much higher failure rate per N installations than others.

Sea Green Sky on May 15, 2011 5:07 AM

"Consider that just in the last two years, their performance has doubled. Doubled!"

This isn't surprising, it's Moore's law, and while we shouldn't expect perpetual growth, Moore's law really hasn't let us down yet. I'd be more interested in a similar statement about their rated MTBF or perceived (by users) failure rate to be halved in two years, since failure seems to be the issue.

Paul Stallworth on May 18, 2011 8:36 AM

This thread reminds me why I quit code development long ago.
It's full of precious little egos.
The Peter Pan Syndrome, calling itself an "industry".

Unlike spinning magnetic hard drives, SSDs have limited write lifetimes. Your massive, bloated OSes and badly-written object-oriented-blahblah apps beat the crap out of them by constantly rewriting to certain files. Figure it out.

Harold Dumbacher on May 18, 2011 12:42 PM

Yesterday the OCZ Vertex 2 80Gb SSD that holds my Win 7 OS failed for the second time in 5 months (first time was after 1 week). This is on a home system that except for short periods of text or video editing (all stored on a hard drive) spends most of its time doing nothing more than acting as a print server. Fortunately I am well backed up and I keep an old 80Gb hard drive hanging around that I can quickly image my last backup onto, just for situations like these.

The only contributing factor I can think of might be some memory problems I've been having lately which caused some BSODs and random failures. I removed and reseated the RAM and things went back to normal then 2 days later my SSD crapped out (no further RAM issues though).

Like others I am swayed by the "Hot/Crazy" argument, as I love the fast boots and quick response time. For now, while OCZ is still providing free replacements I will stick with the SSD. It will be interesting to see how long the next replacement lasts...

Jim Grant on May 18, 2011 4:54 PM

A couple of months ago, the humble 8 GB SSD on my two-year-old Acer Aspire One netbook called it quits, from one day to the next. I tried reinstalling the OS, but data wouldn't stick to the drive anymore. That's another (anecdotal) data point for you.

Andres Cabezas Ulate on May 18, 2011 7:10 PM

Well, i think there are two major scenarios... If you are talking about a PROFESSIONAl usage, meaning that you actually use your PC for WORK, not only games, facebook, blogs, etc, then SSD's are a MUST! (Read it here for some clarification). If its HOME usage, the data loss/failure problem (only if you don't have a backup policy) can be a blocker. On both cases, the tweaks mentioned on a lot of forums and articles about disabling the services and features that do unnecessary writes on the SSD disk (Windows Search, Defrag, Etc) ARE MANDATORY if your want your SSD to last. Personally, for me, HDD NEVER MORE! SSD'S RULE BIG TIME :D!

Codemadesimple.wordpress.com on May 19, 2011 2:38 AM

You don't even need backups for your SSD, because of course you don't actually store any data there – you just install programs on it.

Joren Vermijs on May 19, 2011 12:06 PM

Hi. Let me start by saying that this article was really great, and I've subscribed to the blog by RSS so you'd better keep the good content rolling!

My main reason for commenting though was to ask a question. If SSD's have such a high fail-rate, what does that mean for devices like the ultraportables (Samsung's Series... 9? I think? And the MacBook Air) where the memory modules are soldered to the board and non-serviceable. Are extended warranties a must with these devices, and should users be weary about storing large amounts of data on their devices? It's more of a ponderable, I'm not purchasing one in the foreseeable future (I'm a netbook sort of guy. :P), but I'm curious as to what your take is.

Thanks,
Daniel

Danieldides on May 19, 2011 4:32 PM

More comments»

The comments to this entry are closed.