Speed Hashing

April 6, 2012

Hashes are a bit like fingerprints for data.

Fingerprint-as-hash

A given hash uniquely represents a file, or any arbitrary collection of data. At least in theory. This is a 128-bit MD5 hash you're looking at above, so it can represent at most 2128 unique items, or 340 trillion trillion trillion. In reality the usable space is substantially less; you can start seeing significant collisions once you've filled half the square root of the space, but the square root of an impossibly large number is still impossibly large.

Back in 2005, I wondered about the difference between a checksum and a hash. You can think of a checksum as a person's full name: Eubediah Q. Horsefeathers. It's a shortcut to uniqueness that's fast and simple, but easy to forge, because security isn't really the point of naming. You don't walk up to someone and demand their fingerprints to prove they are who they say they are. Names are just convenient disambiguators, a way of quickly determining who you're talking to for social reasons, not absolute proof of identity. There can certainly be multiple people in the world with the same name, and it wouldn't be too much trouble to legally change your name to match someone else's. But changing your fingerprint to match Eubediah's is another matter entirely; that should be impossible except in the movies.

Secure hashes are designed to be tamper-proof

A properly designed secure hash function changes its output radically with tiny single bit changes to the input data, even if those changes are malicious and intended to cheat the hash. Unfortunately, not all hashes were designed properly, and some, like MD5, are outright broken and should probably be reverted to checksums.

As we will explain below, the algorithm of Wang and Yu can be used to create files of arbitrary length that have identical MD5 hashes, and that differ only in 128 bytes somewhere in the middle of the file. Several people have used this technique to create pairs of interesting files with identical MD5 hashes:

  • Magnus Daum and Stefan Lucks have created two PostScript files with identical MD5 hash, of which one is a letter of recommendation, and the other is a security clearance.
  • Eduardo Diaz has described a scheme by which two programs could be packed into two archives with identical MD5 hash. A special "extractor" program turn one archive into a "good" program and the other into an "evil" one.
  • In 2007, Marc Stevens, Arjen K. Lenstra, and Benne de Weger used an improved version of Wang and Yu's attack known as the chosen prefix collision method to produce two executable files with the same MD5 hash, but different behaviors. Unlike the old method, where the two files could only differ in a few carefully chosen bits, the chosen prefix method allows two completely arbitrary files to have the same MD5 hash, by appending a few thousand bytes at the end of each file.
  • Didier Stevens used the evilize program (below) to create two different programs with the same Authenticode digital signature. Authenticode is Microsoft's code signing mechanism, and although it uses SHA1 by default, it still supports MD5.

If you could mimic another person's fingerprint or DNA at will, you could do some seriously evil stuff. MD5 is clearly compromised, and SHA-1 is not looking too great these days.

The good news is that hashing algorithms (assuming you didn't roll your own, God forbid) were designed by professional mathematicians and cryptographers who knew what they were doing. Just pick a hash of a newer vintage than MD5 (1991) and SHA-1 (1995), and you'll be fine – at least as far as collisions and uniqueness are concerned. But keep reading.

Secure hashes are designed to be slow

Speed of a checksum calculation is important, as checksums are generally working on data as it is being transmitted. If the checksum takes too long, it can affect your transfer speeds. If the checksum incurs significant CPU overhead, that means transferring data will also slow down or overload your PC. For example, imagine the sort of checksums that are used on video standards like DisplayPort, which can peak at 17.28 Gbit/sec.

But hashes aren't designed for speed. In fact, quite the opposite: hashes, when used for security, need to be slow. The faster you can calculate the hash, the more viable it is to use brute force to mount attacks. Unfortunately, "slow" in 1990 and 2000 terms may not be enough. The hashing algorithm designers may have anticipated predicted increases in CPU power via Moore's Law, but they almost certainly did not see the radical increases in GPU computing power coming.

How radical? Well, compare the results of CPU powered hashcat with the GPU powered oclHashcat when calculating MD5 hashes:

Radeon 79708213.6 M c/s
6-core AMD CPU52.9 M c/s

The GPU on a single modern video card produces over 150 times the number of hash calculations per second compared to a modern CPU. If Moore's Law anticipates a doubling of computing power every 18 months, that's like peeking 10 years into the future. Pretty amazing stuff, isn't it?

Hashes and passwords

Let's talk about passwords, since hashing and passwords are intimately related. Unless you're storing passwords incorrectly, you always store a user's password as a salted hash, never as plain text. Right? Right? This means if your database containing all those hashes is compromised or leaked, the users are still protected – nobody can figure out what their password actually is based on the hash stored in the database. Yes, there are of course dictionary attacks that can be surprisingly effective, but we can't protect users dead-set on using "monkey1" for their password from themselves. And anyway, the real solution to users choosing crappy passwords is not to make users remember ever more complicated and longer passwords, but to do away with passwords altogether.

This has one unfortunate ramification for password hashes: very few of them were designed with such massive and commonly available GPU horsepower in mind. Here are my results on my current PC, which has two ATI Radeon 7970 cards generating nearly 16000 M c/s with MD5. I used oclHashcat-lite with the full range of a common US keyboard – that is, including uppercase, lowercase, numbers, and all possible symbols:

all 6 character password MD5s47 seconds
all 7 character password MD5s1 hour, 14 minutes
all 8 character password MD5s~465 days
all 9 character password MD5sfuggedaboudit

The process scales nearly perfectly as you add GPUs, so you can cut the time in half by putting four video cards in one machine. It may sound crazy, but enthusiasts have been doing it since 2008. And you could cut it in half again by building another PC with four more video cards, splitting the attack space. (Keep going if you're either crazy, or working for the NSA.) Now we're down to a semi-reasonable 117 days to generate all 8 character MD5s. But perhaps this is a worst-case scenario, as a lot of passwords have no special characters. How about if we try the same thing using just uppercase, lowercase, and numbers?

all 6 character password MD5s3 seconds
all 7 character password MD5s4 minutes
all 8 character password MD5s4 hours
all 9 character password MD5s10 days
all 10 character password MD5s~625 days
all 11 character password MD5sfuggedaboudit

If you're curious about the worst case scenario, a 12 character all lowercase password is attainable in about 75 days on this PC. Try it yourself; here's the script I used:

set BIN=oclHashcat-lite64
set OPTS=--gpu-accel 200 --gpu-watchdog 0 --outfile-watch 0 --restore-timer 0 --pw-min 6 --pw-max 6 --custom-charset1 ?l?d?s?u
 
%BIN% %OPTS% --hash-type 0 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ?1?1?1?1?1?1?1?1?1?1?1?1?1

Just modify the pw-min, pw-max and the custom-charset as appropriate. Or, if you're too lazy to try it yourself, browse through the existing oclHashcat benchmarks others have run. This will also give you some idea how computationally expensive various known hashes are on GPUs relative to each other, such as:

MD523070.7 M/s
SHA-17973.8 M/s
SHA-2563110.2 M/s
SHA-512267.1 M/s
NTLM44035.3 M/s
DES185.1 M/s
WPA/WPA2348.0 k/s

What about rainbow tables?

Rainbow tables are huge pre-computed lists of hashes, trading off table lookups to massive amounts of disk space (and potentially memory) for raw calculation speed. They are now utterly and completely obsolete. Nobody who knows what they're doing would bother. They'd be wasting their time. I'll let Coda Hale explain:

Rainbow tables, despite their recent popularity as a subject of blog posts, have not aged gracefully. Implementations of password crackers can leverage the massive amount of parallelism available in GPUs, peaking at billions of candidate passwords a second. You can literally test all lowercase, alphabetic passwords which are ≤7 characters in less than 2 seconds. And you can now rent the hardware which makes this possible to the tune of less than $3/hour. For about $300/hour, you could crack around 500,000,000,000 candidate passwords a second.

Given this massive shift in the economics of cryptographic attacks, it simply doesn’t make sense for anyone to waste terabytes of disk space in the hope that their victim didn’t use a salt. It’s a lot easier to just crack the passwords. Even a “good” hashing scheme of SHA256(salt + password) is still completely vulnerable to these cheap and effective attacks.

But when I store passwords I use salts so none of this applies to me!

Hey, awesome, you're smart enough to not just use a hash, but also to salt the hash. Congratulations.

$saltedpassword = sha1(SALT . $password);

I know what you're thinking. "I can hide the salt, so the attacker won't know it!" You can certainly try. You could put the salt somewhere else, like in a different database, or put it in a configuration file, or in some hypothetically secure hardware that has additional layers of protection. In the event that an attacker obtains your database with the password hashes, but somehow has no access to or knowledge of the salt it's theoretically possible.

This will provide the illusion of security more than any actual security. Since you need both the salt and the choice of hash algorithm to generate the hash, and to check the hash, it's unlikely an attacker would have one but not the other. If you've been compromised to the point that an attacker has your password database, it's reasonable to assume they either have or can get your secret, hidden salt.

The first rule of security is to always assume and plan for the worst. Should you use a salt, ideally a random salt for each user? Sure, it's definitely a good practice, and at the very least it lets you disambiguate two users who have the same password. But these days, salts alone can no longer save you from a person willing to spend a few thousand dollars on video card hardware, and if you think they can, you're in trouble.

I'm too busy to read all this.

If you are a user:

Make sure all your passwords are 12 characters or more, ideally a lot more. I recommend adopting pass phrases, which are not only a lot easier to remember than passwords (if not type) but also ridiculously secure against brute forcing purely due to their length.

If you are a developer:

Use bcrypt or PBKDF2 exclusively to hash anything you need to be secure. These new hashes were specifically designed to be difficult to implement on GPUs. Do not use any other form of hash. Almost every other popular hashing scheme is vulnerable to brute forcing by arrays of commodity GPUs, which only get faster and more parallel and easier to program for every year.

[advertisement] Hiring developers? Post your open positions with Stack Overflow Careers and reach over 20MM awesome devs already on Stack Overflow. Create your satisfaction-guaranteed job listing today!
Posted by Jeff Atwood
63 Comments

And don't forget scrypt (http://www.tarsnap.com/scrypt.html) which does tend to offer a bit more protection against brute force attacks.

Mario Figueiredo on April 6, 2012 3:34 AM

If you are a user:

Make sure all your passwords are 12 characters or more, ideally a lot more.

What's one to do when a bank/insurance company/government website's server-enforced password policy is "6-8 numbers or letters, no special characters"?

adam on April 6, 2012 3:45 AM

"Take a look at what I'm wearing, people. You think anybody wants a roundhouse kick to the face while I'm wearing these bad boys? FUGGEDABOUDIT!"

Stolen Data on April 6, 2012 3:53 AM

> What's one to do when a bank/insurance company/government website's server-enforced password policy is "6-8 numbers or letters, no special characters"?

Cry.

But seriously: switch bank/insurance/government agencies if you can. Also try accented characters in UTF-8 to increase the entropy, if pôssïblé?

Jeff Atwood on April 6, 2012 4:10 AM

@Jeff, I'd rather use two factor. Wait... I already do. Google uses it. My bank uses it. I use Google to sign in with almost everything (OpenID FTW!).

Mircea Chirea on April 6, 2012 4:15 AM

While passphrases may well be better than passwords, it strikes me as folly to rely on memory at all. Only the most conscientious are going to maintain the kind of discipline that approach requires, given the dozens of logins most of us have.

I can't see anything for it right now other than to use software assistance of some sort. I use 1Password right now, but have used other systems. I have no idea what most of my passwords are, just that they're usually 15ch or more.

CB on April 6, 2012 4:36 AM

So what you're saying is that people need to use good, long passphrases to be secure? Gosh :)

One more tip I'd add is to hash the username together with salt and password. Obviously if you save the username together with hashes and the attacker knows your algorithms it doesn't make you much more secure, but either way it slows down some types of attacks.

Pies on April 6, 2012 4:38 AM

Hashing isn't just used for security. Everything you've said here is really only for cryptographic hash functions (SSL, password storage, document authentication, etc.).

It is certainly acceptable to roll your own generic hash functions, as used in what is probably the most popular data structure, hash tables. I will almost always know more about my data set than a generic hash function designer (e.g. java.lang.String.hashCode).

For this usage, security isn't a concern, only speed and reducing the possibility of collision within my data set.

http://en.wikipedia.org/wiki/Cryptographic_hash_function

William Furr on April 6, 2012 4:41 AM

Very good post! I wrote almost the same post on my blog last year: http://blog.ircmaxell.com/2011/08/rainbow-table-is-dead.html

One point though. Cryptographic hashes are supposed to be fast, not slow. The faster the hash, the better in that it is faster to verify the message. Otherwise, we'd be using iterated hash functions for signing data, which we're not. The primary defense against brute forcing is not the speed, but the shear size of the output space (which is why most modern hash algorithms are 256 bit or better). The speed of the function is seen as an advantage because it lets you quickly tell the validity of the message.

Now, for password hashing, this speed is a negative since it's very fast to brute force an entire search space (relative to documents). And that's why iterated password hashing algorithms (such as scrypt, bcrypt and pbkdf2) exist. To make the fast hashing function slower for its uses...

Otherwise, very well said...

Anthony

Anthony Ferrara on April 6, 2012 4:44 AM

If you happened to use www.thepasswordproject.com as background for this article, I'd appreciate a link :) If not, there is some hashcat stuff there could be interesting to your readers.

Tinius Alexander Lystad on April 6, 2012 4:48 AM

bcrypt is anything but new. In fact, it is one of the old kids around the block. However, people need to avoid using the broken implementation, revealed by CVE-2011-2483.

SaltwaterC on April 6, 2012 4:50 AM

Good post. However, the part about salted hashing is somewhat misleading. Salted hashes (in this form of usage, not in all) are there to slow down hashing, hiding the salt is (as you explain) pointless. This is also why shadow(3) stores hashes in the exact same line as the hash: It doesn't matter.

To explain this a bit more thoroughly: Salts must be chosen on a per-password basis in order to provide any hardening against brute forcing. They then provide the additional security that if two of your users (let's call them 'steve' and 'john') have the same password 'password' and you use their usernames as salt, their two password hashes will still be different, because H('stevepassword') and H('johnpassword') are completely different (provided you're using a cryptographic hash).
With unsalted hashes (or when the salt is the same for the whole database), as soon as anyone knows the hash of 'password', they can also check through all other rows if they have the same hash.
With salted hashes, you have to hash your entire character set against any different salt and check it against the corresponding row.

Sp0rkbomb on April 6, 2012 4:51 AM

It seems like focussing on the speed of cracking a password misses a more important point: in real work systems we can control the number or speed of logins. The way that's typically implemented is to only allow N wrong logins and then logging the user out. A better alternative in my opinion is simply to double the time between logins, making brute force attacks much slower.

Of course, that creates the possibility of a DOS attack to prevent a particular user from logging in (presuming you know the userid). There probably are good application-specific solutions there (e.g. registering the account to a specific IP or IP block, with a registration process if static IPs can't be guaranteed) and nothing beats having a good sys admin with proper network monitoring tools.

Chip McCormick on April 6, 2012 4:52 AM

@chip that is a totally different issue. All these attacks assume the database has been compromised and released into the wild. This happens all the time, it's quite common.

http://www.codinghorror.com/blog/2010/12/the-dirty-truth-about-web-passwords.html

(or an internal employee decides to hack the CEO's password..)

As for attacking websites, even with rate limited logins (of course a must) a basic dictionary approach will work wonders, see:

I don't want to draw any conclusions from such a small sample size, but at a minimum we can see that it would have been possible to crack more than a third of the user accounts in both leaks using five easy-to-get dictionaries.

Dictionary Attacks 101 describes a real life Twitter attack that worked this way.

Jeff Atwood on April 6, 2012 5:06 AM

Great tips about the server side. On the user side, I recommend open source password keeping software such as KeePass and Password Safe. I've been using Password Safe for years to generate long, secure passwords I don't need to remember.

Fernandoacorreia.wordpress.com on April 6, 2012 5:08 AM

"A given hash uniquely represents a file, or any arbitrary collection of data. At least in theory."

No. No. No

A given hash does not uniquely represent a file, or any arbitrary collection of data in theory, but if well designed it can in practice.

You actually understand this (because the rest of your article explains just why there are problems if the hash function is no longer deemed secure) but utterly misstate the problem in your first line!

The whole point of reasonable output sized crypto secure hashes for identity is that the probabilities of collision outweigh the fact that collisions are, in the infinite sense, inevitable.

Matthew Hope on April 6, 2012 5:17 AM

I usually use this function for password generation.

var generatePassword = function(username, password){
    var passwordData = {}
    passwordData.salt = hash(hash(username) + (new Date()).valueOf())
    passwordData.password = hash(passwordData.salt + hash(password))
    return passwordData
}

Yes, 4 hashes may be overdoing it. But this way even if they find a collision for the hashed password, chances that they'll actually find the plaintext version are slim.

Sofa420 on April 6, 2012 5:49 AM

While I was skeptical about your twitter post earlier, you've got some valid points. I've done some MD5 GPU experiments before but never managed to get quite the throughput you've got here.

Another slightly related point is that, apart from the maximum or charset requirements some sites put (minimum, okay, maximum, say what?!)..
I'd like to vent my utter frustration about so called "security questions". I'm shouting my cats name every night. Secure. So... very... secure.

And if you'll excuse me now, I'll try to remember my new 8-word passphrase. Thanks for the nice article!

sanderd on April 6, 2012 5:51 AM

Use scrypt if you have a choice, not bcrypt or PBKDF2: http://www.tarsnap.com/scrypt.html

Eitan Adler on April 6, 2012 6:05 AM

Good post but talking about hash functions being broken when your authentication system is a password is basically rearranging the deckchairs on the titanic.

Passwords are broken much, much worse than MD5 is broken.

Anybody with data to protect that thinks they can't tolerate the compromise of a password, probably shouldn't be using a password.

Any user relying on a password should assume it will be compromised and be ready to change it.

Fodwyer on April 6, 2012 6:20 AM

All this commentary on the weakness of MD-5 and SHA-1 and no mention of NIST's competition for a new cryptographic has algorithm?

http://csrc.nist.gov/groups/ST/hash/

Chris Upchurch on April 6, 2012 6:22 AM

"In reality the usable space is substantially less; you can start seeing significant collisions once you've filled half the space, but half of an impossibly large number is still impossibly large."

Actually, thanks to the birthday paradox, you can expect to see a collision after using the square root of the number of valid identifiers, and the page you linked to even says as much. You've made a common mistake: 2^64 is not half of 2^128.

Arachnid on April 6, 2012 6:33 AM

Sofa420: This looks like a waste of cpu cycles to me.

Salts have only one purpose: to make sure that the same password evaluates to a different hash. Therefore, it only secures against those famous dictionary-attacks (= rainbow tables).

The only thing that really matters for a salt, is that's unique.

In fact, your implementation might not be secure at all, if you also store that date in your database (as date/time of registration for example). This makes your salt predictable, and not really random, therefore defeating the purpose of a salt.

I think it's better to use a cryptographic number generator for this. Now if you are doing this in JavaScript, you don't really have one. Maybe you could use something like this (pseudocode) :

salt = hash(mouse_cursor_position+ screen_resolution + user_agent + ...)

a combination of this should be more random then taking the username and registration date as a salt. The extra hashing you do in your code, does nothing to prevent that.

Jeroen Jacobs on April 6, 2012 6:44 AM

There's one easy way to get people to use long passwords. Tell them to figure out an easy phrase to remember and add it to the end of their password every time. Done.

That way, "fred" (a terrible password) becomes "fredthehungergames" (actually a pretty good password). Same at every site. Increasing the length adds security far more than increasing the complexity of each character.

Richard Stanford on April 6, 2012 7:11 AM

Hi Jeff,

You've shared a lot of valuable information here, especially in regards to the password cracking part! However, I find the subheading "Hashes are designed to be slow" a bit too broad. If that was true, how would you explain the popularity of MurmurHash, Jenkins and FNV1, all of which are designed for speed? Your article would be more correct if you specify "Secure hashes" or "Cryptographic hashes" here (and a few other places) instead. Otherwise, interesting article as usual.

Preshing on April 6, 2012 7:52 AM

There is a security tradeoff with tunable workfactor based hashing systems (bcrypt and co) - it is really quite easy to launch a DoS against such a scheme since by design they consume system resources. Sure, that's not the same issue as a data breach, but if 2011 is any indication DoS is popular in a big way again among a certain particular populace.

As for salting - it has an exponential effect on the work to crack regardless of whether the salt is secret or not. with a salt you have to brute force passwords per user, since for a given user it is necessary to go through + user specific salt for every user specific salt.

JBW_1 on April 6, 2012 7:56 AM

All this is some pretty good advice, but a couple of points to tweak (since I'm an application security professional):

  • Terminology nit: hash functions produce digests. It doesn't matter most of the time, but when talking about both algorithms and what they produce, it's helpful to use the formal term -- hash for algorithm/function and digest for its output.
  • If you use a good enough hash algorithm (SHA2, Bcrypt, etc.) and a long enough random salt per-user, you are still in good shape against rainbow-table attacks.
  • Adding salt effectively increases the password length for the purposes of hash-table attacks; so an 8-char password with a 10-char salt is effectively 18 chars. Note that this does not substitute for long passwords for any other type of guessing attack.
  • The best thing you can do in your password policy is require long passwords. Complexity requirements only matter if the system is limited to short passwords for some reason. Increasing length increases the complexity of a brute-force attack exponentially, while complexity rules increase it linearly.

The short version: as a developer, the best things you can do are allow/encourage long passwords, choose a good hash algorithm, and use long and random salts per-user. (And you're right, you don't need to hide the salts -- that adds a lot of complexity for essentially no security value).

Dpmeyer on April 6, 2012 8:11 AM

Corrections:

  • Random collisions for MD5 start to become frequent when you have around 2^64 digests, which is the square root of the cardinality of MD5's output space, not 2^127, which is half the cardinality of MD5's output space. To provide k bits of security against random collisions, you need a cryptographic hash function providing 2k bits of output, not k+1 bits of output.
  • Cryptographic hashes functions are usually designed to be as fast as possible across a wide variety of currently-in-use hardware, including servers, desktops/laptops, handheld devices, and embedded chips. A password hashing function has the unusual requirement that it should be very slow to compute in addition to it being a good cryptographic hash function. However, most other uses of cryptographic hash functions - such as (1) HTTPS (2) the ETag header in HTTP responses (3) the session-cookie signing in every Rails application - require an extremely fast function and have no use for a slow hash function.

Jay on April 6, 2012 8:21 AM

I'd love to use pass phrases...

Except my bank only allows a maximum of 16 characters for a password.

And even better, they require security questions, which I can't just fill with garbage; every time I log in, I need to answer a random security question. So I need to either keep the security question answers honest so I can remember them or write them down somewhere.

Asmor on April 6, 2012 8:22 AM

We're doing something wrong. The fact there is this battle between better encryption and better cracking is just a never-ending arms race.

I am not sure the solution, but I think it is something along the lines of making the user database useless outside of the realm it is being used. Maybe have the entire database hashed against the administrator log-in and that information is never written to disk, just cached in RAM?

Tying it to the hardware somehow? I don't know what it is but surely there is a better solution.

Jeffrey Davis on April 6, 2012 8:24 AM

Jeff, you really should correct a few factual points here:

* This whole post is about secure hashes. There's such a thing as an (intentionally) insecure hash function, e.g. cityhash, spookyhash, or murmurhash.

In your nomenclature, all hashes are secure, and all insecure hash functions are checksums. This doesn't match the way the words are commonly used, as you can observe by the fact that these three insecure hashes all have "hash" in their name.

* Second, as others have pointed out, secure hashes are not designed to be slow. Some are. Most are not.

In general, I want SHA1 to be fast, because I use it for lots of things other than hiding passwords. If I have a lot of documents to hash, it's great if I can implement my hash function on the GPU! Indeed, if you look at the SHA3 candidates, you'll see that they're explicitly competing on speed.

Justin Lebar on April 6, 2012 8:35 AM

Jeffrey: such a system already exists: smartcards

No need for storing passwords in a database anymore, and it's perfectly usable on the web in combination with SSL.

Jeroen Jacobs on April 6, 2012 8:36 AM

If you could mimic another person's fingerprint or DNA at will, you could do some seriously evil stuff. MD5 is clearly compromised, and SHA-1 is not looking too great these days.

Collision attacks against MD5/SHA1 are nothing like mimicking another person's DNA; they are like the ability to create twins who share the same DNA. As you can guess from the lack of frauds committed by evil twins, that ability is not very dangerous. The same is true in cryptography: collision attacks can be problematic in some cases (such as when a hash is used to digitally sign a message), but completely irrelevant to cracking passwords.

As for figuring out the plaintext which hashes to a certain string, that is called a first preimage attack, and no such attack is known against either SHA1 or MD5 (or even MD4). So SHA1/MD5 is just as secure for password hashing as SHA-256 or any other fast cryptographic hash. (Which is to say, not very secure - as the article correctly explains.)

Use bcrypt or PBKDF2 exclusively to hash anything you need to be secure. These new hashes were specifically designed to be difficult to implement on GPUs.

I don't think PBKDF2 was specifically designed to be GPU-unfriendly - it was just designed to be slow.

Gergő Tisza on April 6, 2012 9:25 AM

Thanks. At first I thought your post was just a bunch of facts any professional programmer should have learned in college, but I had no idea how fast some of these hashes can be cracked nowadays.
Cool stuff

Tiago Almeida on April 6, 2012 11:15 AM

I agree with Jeroen Jacobs, there is the whole other side of hashing for speed and minimal collisions.

Joe Marquardt on April 6, 2012 12:57 PM

Jeff, you might be interested in a paper I just published on using hash digests for both message and content identity. We make it possible to figure out if two files, even if they might not be the same format, contain the same content: http://tw.rpi.edu/web/doc/parallelIdentitiesOGD

Jpmccu on April 6, 2012 2:43 PM

The early Unix password scheme involved running the hash iteratively - 1000 times IIRC. How effective is this in strengthening hashes, these days? Does running SHA256 on a password, 64k times, make it take 64k times as long to check a possible, using these GPU attacks?

jdege on April 6, 2012 4:00 PM

http://www.schneier.com/blog/archives/2012/03/the_security_of_5.html
http://www.lightbluetouchpaper.org/2012/03/07/some-evidence-on-multi-word-passphrases/

"The results are discouraging: by our metrics, even 5-word phrases would be highly insecure against offline attacks, with fewer than 30 bits of work compromising over half of users."

TL;DR - passphrases are better than passwords, but they're far from perfect.

John Tobin on April 6, 2012 5:21 PM

Also, securing the password exchange is becoming more and more important these days. Especially on non-SSL web servers (or on any non-encrypted connection). A nice way to do that is by using SRP: http://srp.stanford.edu/.

Samuel Lorétan on April 6, 2012 6:45 PM

A little humor for the day: On the subject of "long enough" passwords, saw this over at TheDailyWTF.com:

http://img.thedailywtf.com/images/12/q2/err7/pic4.png

Ted Oliverio on April 6, 2012 7:23 PM

Is there any way for a hacker to determine what type of hashing algorithm was used?

Let's say a hacker got hold of a password database, can he know given a set of hashes which hashing algorithm was used to generate these?

Lior Tal on April 7, 2012 2:46 AM

You could also increase the hash rounds which would artificially increase the time needed to calculate the password from the hash. See http://en.wikipedia.org/wiki/Key_stretching for more information.

Alexander Thaller on April 7, 2012 3:09 AM

Salt is not required to be secret at all, the idea of a random salt (and it should be unique for each user) is to make it impossible to use rainbow tables. Of course, attacker knows the salt since it's in the same DB table as a password, but he will have to brute-force each account separately, which makes it almost impossible to hack thousands of accounts at once. This is pretty much the best you can do, and the same approach is used by linux and many other systems for decades.

Also, when adding the salt one should always use HMAC, instead of simple concatenation of salt to the password. HMAC is developed specially for this, it benefits in better security as e.g. HMAC-MD5 does not suffer from the same weaknesses as MD5. I believe all major languages have support for it.

Ivanhoe011 on April 7, 2012 4:16 AM

Thanks for the Article! I've added it to the hashcat wiki: http://hashcat.net/wiki/

One note: I did not optimize my MD5 hash cracking code on hashcat CPU for a while. Most of my optimizations go into oclHashcat-lite first. While the 8213 Mhash/s on a stock clocked hd7970 is unbeaten, if I would port this code to CPU a stock clocked 6-core bulldozer it would run at 220 Mhash/s, not 50 Mhash/s.

Hashcat on April 7, 2012 8:41 AM

My bank, BMO, requires exactly 6 alphanumeric characters. No symbols or whitespace. They also use security questions, two of which I always get wrong.

Nico Bonada on April 7, 2012 3:22 PM

@sofa420: You've missed the point quite thoroughly. Nobody is going to try and find collisions for those hashes that would somehow return the original plaintext, they're just going to run a password dictionary through your algorithm to see if any of them gives them the stored hash.

Given that they will have access to your stored salts (they've got your database) all that algorithm does is double the amount of hashing/computing/time needed.

Jaroslaw Filiochowski on April 7, 2012 6:26 PM

@Lior Tal: If they got hold of the password database, I would assume they also got hold of the password checking code (be it scripted, compiled or publicly available). Even in the improbable case they didn't, they could guess the algorithm if they managed to generate and store their own password (known plaintext + digest).

Jaroslaw Filiochowski on April 7, 2012 6:50 PM

Another example of super high-speed checksum calculation is 40 or 100Gbps Ethernet (High Speed Ethernet) frames CRC check!

Arnaud Castaner on April 8, 2012 2:09 AM

@Lior Tal: The only sane way to approach security is to assume that the bad guys have everything you have, up to and including physical possession of the machine(s) (whether by a successful remote or by actually taking the machine). And understand that there is no such thing as absolute security (even booby-trapping your servers isn't a guarantee); all you can ever do is make things (to use the technical phrase) really, really frikkin' inconvenient for the cracker. Making the search space as large as possible (long, complex passwords or passphrases plus a unique per-user token/salt) and the transformation algorithm as slow as possible, while eliminating any inefficiencies that would make it trivially simple to speed things up (as with Bcrypt, Scrypt or PBKDF2) come under the category of "making things really, really frikkin' inconvenient".

And let's keep in mind that no matter how sanitary our own personal habits may be, most users tend to use the same password (or couple of passwords) *everywhere*, so the comment system on your kitty-cat blog (being a bit facetious) may well be the gatekeeper for your users' email, bank accounts, work accounts, and so on. I can't stress this enough: you are _never_ protecting just your own site with your password system.

Stan Rogers on April 9, 2012 12:22 PM

I've recently been working on an app with some crypto stuff in it, and i'm totally pumped that the exact two things you mentioned in your developers advice paragraph (Bcrypt and PBKDF2) are the two algorithms i'm relying on! Sweet :)

(if anyone's interested, it's called 'Skeleton Key Password Manager' on the iOS app store)

Chris Hulbert on April 9, 2012 9:04 PM

Hmmm, Wikipedia says:

One weakness of PBKDF2 is that while its c parameter can be adjusted to make it take an arbitrarily large amount of computing time, it can be implemented with a small circuit and very little RAM, which makes brute-force attacks using ASICs or GPUs relatively cheap[13].

Mirvnillith on April 10, 2012 12:12 AM

Can we use old hash algorithms but with a better salt? If the reason bcrypt is good is memory requirements perhaps we can use MD5 with a very long salt? Is there a salting method which does not simply concatenate salt with the password but uses a less GPU friendly logic?

Misha Golub on April 10, 2012 12:43 AM

how can i erase my finger prints. is it even posible. by the way send free text messages at textme4free.com

Mark Blanco on April 10, 2012 12:57 PM

@Misha Golub: As Ivanhoe011 pointed out above, there are alternative ways to use a salt, specifically HMAC variants of hashing. But that still leaves you with an algorithm that is relatively small both in time and space requirements, which still leaves you wide-open to fast brute force attacks.

A "system salt" is of very little value, since it represents only a minor modification to the hashing algorithm. A cracker can run a brute-force attack and look for ANY match in the user database. With a per-user salt (a nonce), the algorithm is effectively changed for every user, so brute-forcing can only be done one account at a time.

As for the space weakness of PBKDF2, that's a fact, but it's time complexity still makes it valuable for use in environments where space complexity would be a genuine problem. I am thinking specifically about low-cost (or free) shared hosting, where the space requirements for Bcrypt or Scrypt might trigger caps or suspensions. Again, there is no such thing as perfect security, the best we can do is to make things sufficiently inconvenient for the bad guys without overwhelming our own resources. I would have no compunctions about using PBKDF2 in a SME or personal site environment, where the available resources are likely to be significantly smaller than they would be on a dedicated/colo machine or in a multi-instance cloud environment.

Stan Rogers on April 11, 2012 1:09 PM

A cunning, attractive individual of the opposite sex can still get all of your information, the key to your house, the money in your wallet, and the clothes on your back, in just a week or two of telling you what you want to hear.

Hackers don't use software to nearly as much these days, not because security measures have improved, but because they can dig through your dumpster or lie to you on a social networking site and achieve the same objectives, for a tiny fraction of the time & effort.

You can build another Great Wall of China but it's pointless when your attackers are coming from anywhere but the North.

Justin Time on May 8, 2012 11:50 PM

It's too bad that I cannot find a bank that allows me to use pass-phrases of a usable length, much less any consistency between password requirements.

Password requirements


  • Wells Fargo

    • 6-14

    • at least one letter and one number.

    • It cannot contain nine or more numbers.

    • Allows @, %, &, #


  • Chase

    • 7-32

    • at least one letter and one number

    • Cannot include special characters (&, %, *, etc.)

    • Cannot be the same as any of the last five Passwords you've used

    • Cannot be the same as your User ID


  • American Express

    • 8 - 20

    • at least one letter and one number

    • Allows %,&,_,?,#,=,-

    • Is not case sensitive!!



  • Pay Pal

    • 8 - 40

    • includes both capital and lower case letters

    • Not a word you can find in the dictionary

    • Requires at least one 1-9,!,*,_,etc



I. on May 10, 2012 12:32 PM

Great tips for users and developers alike. I'm very interested to see how this field changes when quantum computers start to play a role. With the D-Wave One now sporting a 128-qubit processor chipset, it's going to get interesting soon. This isn't my field (by a very long shot), but it seems like hardware like that could supplant GPUs in calculating hashes and potentially force some big changes in how we secure... jeez, maybe everything.

Jaredchung on June 6, 2012 8:36 AM

Is the claim no one bothers with rainbow tables any more really accurate though? I note someone in reply to Anthony Ferrara's blog claimed that rainbow tables still have their uses.

I'm not involved in the field, but it strikes me too as probably wrong. Storage space isn't exactly expensive. Even with the recent spikes due to the Thailand floods, you can still have 8TB for less then US$500 or about the price of a high end GPU. This is more then enough to store all current MD5 tables from the distributed rainbow tables effort http://www.freerainbowtables.com/en/tables2/ with room for expansion either of your own work or future stuff from the distributed work. How much time will this save you? I don't actually know although my impression is it will often be faster then computing the stuff again. (Many of the tables apparently require GPU for use http://www.freerainbowtables.com/phpBB3/viewtopic.php?f=2&t=3255 .)

I'm not sure it's really a matter of just hoping your victim didn't use a salt. If you've targeting a specific victim or perhaps a few where security is paramount perhaps it's forlorn. But if you're say. a spammer breaking in to sites to try and steal their passwords hoping they were used for Facebook, twitter or email accounts you can then use for spamming, the recent high profile anonymous and LulzSec breaches have shown theres a fair chance you will have success many times since so many sites still don't bother to salt. Given the price of storage space, you probably only have to use your rainbow table a few times to make up for the cost. In other words, it's a numbers game and the evidence suggests to me rainbow tables will often still be a winner in some fashion.

Tet Yoon Lee on June 28, 2012 11:51 AM

See also http://news.ycombinator.com/item?id=2891654

Note that I'm not disagreeing with the basic premise, that with GPUs certain hashes can be calculated very fast so a salt is far from sufficient with out a fairly long password and a better hash function, simply pointing out from what I can tell, rainbow tables are still useful and still being used.

Tet Yoon Lee on June 28, 2012 12:05 PM

sanderd: The idea of the questions is to apply more than one human authentication approach (Something you know, have or are) without introducing any extra complexity such as credit cards or electronic signature.

More on the subject: http://www.cs.cornell.edu/courses/cs513/2005fa/nnlauthpeople.html

Ki6i on November 10, 2012 3:30 PM

In regards to portability the Zenbook is awesome. Just like the Macbook Air, it is easy to carry with you on air planes. Brought mine to China a few months back, I felt sorry for the guys lugging around old style Macbook Pros and PC Notebooks. Looks like they came straight out of the 90ies china manufacturing

Jerry John on December 31, 2012 4:13 AM

Bcrypt and PBKDF2 are not hashes! Sure, PBKDF2 may use cryptographically secure hashes internally, but it's a key derivation function, not a hash. You should never use plain secure hash, salted or not, to store passwords. Secure hashes, contrary to what this terrible in its sloppy wording article states, are always designed to be as fast as possible. Even the newest of NIST-approved hash functions, SHA-3 (Keccak), which is somewhat slow in software, is fast in hardware. See also BLAKE2 which is a state-of-the-art of hashes for software (it's based on another finalist of SHA-3).

As for the comments recommending scrypt over bcrypt and PBKDF, indeed, scrypt was designed to be sequential memory-hard and therefore more expensive to crack on ASICs than other password storage schemes, but experimental results showed that you need really a lot of memory, more than you can afford using on each login in most setups, for it to provide the same level of security as you can easily achieve with bcrypt today. Scrypt is the future, but for today you're probably better off with bcrypt (or you can combine them if you're super paranoid, but you'll probably do it wrong so stick to bcrypt for the time being).

k on April 3, 2013 12:54 AM

The comments to this entry are closed.