I <3 Steve McConnell*
Coding Horror
programming and human factors
by Jeff Atwood

December 3, 2007

Shuffling

Pop quiz, hotshot. How would you write code to shuffle a deck of cards?

a complete deck of 52 cards, arranged by suit and in ascending order

I was thinking about this after reading Mike's card-shuffling algorithm woes:

Here's where the non-CS mind comes into play. My first thought was to generate an unshuffled deck as an array-like structure -- all cards in order by suit. Then I'd create a second array-like structure. I'd walk through each card in the unshuffled deck, pick a random number, and insert the card at the randomly selected spot in the second array. If the randomly chosen position in the second array was already occupied, I'd choose another random number, see if it was used, and so on until the random selection happened to land on a free spot. I'll call this the Random Insert approach.

It seemed an odd approach to me, but unlike Mike, I have the dubious benefit of a programming background. I went immediately for my old friend, the loop. Let's assume we have an array with 52 members representing the 52 cards in the deck.

var rand = new Random();
for (int i = cards.Length - 1; i > 0; i--)
{
    int n = rand.Next(i + 1);
    int temp = cards[i];
    cards[i] = cards[n];
    cards[n] = temp;
}

So we loop through the deck, switching each card with another card from a random position in the deck. Seems straightforward enough, although I do wish there was a built in Swap command in the C# language to simplify the code a bit. It's eerily similar to the Knuth or Fisher-Yates shuffle, which doesn't mean I'm particularly smart, but that shuffling is an easily solved problem.

Or is it? This looks correct; there's nothing obviously wrong here. But there are two problems with this code. Can you see them?

The first problem is right here:

new Random();

Computers are lousy random number generators. Any shuffling you do, whatever the algorithm, will only be as good as your random number generator. So if you're running, say, an online casino, you need to be very careful when you start throwing around the word "Random" in your code. If you aren't careful, there will be.. problems.

The flaw exists in the card shuffling algorithm used to generate each deck. Ironically, the code was publicly displayed at www.planetpoker.com/ppfaq.htm with the idea of showing how fair the game is to interested players (the relevant question has since been removed). In the code, a call to randomize() is included to produce a random deck before each deck is generated. The implementation, built with Delphi 4 (a Pascal IDE), seeds the random number generator with the number of milliseconds since midnight according to the system clock. That means the output of the random number generator is easily predicted. A predictable "random number generator" is a very serious security problem.

By synchronizing our clock with the clock on the online casino and hitting the "shuffle" button, our program can calculate the exact shuffle. That means we know all the cards that have yet to appear, everyone's hand, and who will win. The screen shot below shows the information displayed by our program in realtime during an actual game. Our program knows what cards are to appear in advance, before they are revealed by the online game.

To be fair, this was 1999. I'd assume most online casinos have hired competent cryptographers and statisticians by now. With the ever looming specter of insider cheating and poker bots, they'd be fools not to.

The second problem with this code is that it's too complicated. Eric "purplicious" Lippert explains why, in his own inimitable way:

The standard way of implementing this algorithm is: associate each card with a random real number between 0.0 and 1.0. Sort the list based on its associated number. That's O(n log n) and has no bias.

As it turns out, the easiest way to implement a shuffle is by sorting. It's not exactly faster, as the typical sort is O(n log n) compared to the O(n) of the Knuth Fisher-Yates shuffle algorithm. We'll just sort by a random number-- in this case, a GUID.

var cards = Enumerable.Range(0, 51);
var shuffledcards = cards.OrderBy(a => Guid.NewGuid());

So we can ultimately implement a secure, unbiased shuffle as a one-liner in a modern programming language.

Which proves.. well, nothing, I suppose, but like so many other programming problems, there are a lot of ways to get shuffling wrong if you're not careful.

Posted by Jeff Atwood    View blog reactions
« Presentation: Be Vain
Please Don't Steal My Focus »
Comments

I had to randomly shuffle some data once for machine learning testing.

That's when I discovered the STL's random_shuffle method. Works pretty well.

Tom on December 4, 2007 2:14 PM

In Java:

java.util.Collections.shuffle(cardList);

http://java.sun.com/javase/6/docs/api/java/util/Collections.html#shuffle(java.util.List)

Arend on December 4, 2007 2:23 PM

No Swap built in? Don't cry, create! :-)
Seems like an case for extension methods:

public static void Swap<T>(this T[] vaData, int vnJ, int vnI)
{
T lnTemp = vaData[vnJ];
vaData[vnJ] = vaData[vnI];
vaData[vnI] = lnTemp;
}
...
cards.Swap(i, n);

titrat on December 4, 2007 2:25 PM

Shuffling 52 cards randomly requires quite a specialised random number generator, whether you use the Fisher-Yates method or the sorting method.

There are 52! (52 factorial) ways of arranging the cards, and 52! is a very large number - bigger than 2^225. Any random number generator used to generate fair shuffles must thus have an internal state of at least 226 bits to randomly generate shuffles. If it only has 32 bits (or even 128 bits), you can only hope to generate a minute fraction of the possible shuffles.

I'm not sure what properties Guid.NewGuid() has, but since there are only 2^122 GUIDs, it seems unlikely that it has more than 225 bits of internal state - can someone confirm or refute this?

Chris Johnson on December 4, 2007 2:29 PM

A really good random generator has NO state.
Or does have radioactive decay any state?
Only pseudo-generators need to mantain a state.

titrat on December 4, 2007 2:33 PM

In most implementations, however, random_shuffle uses the srand() seed, unless you provide it a RNG object of your own. And in both of these cases, your randomizer is only as good as the seed you give it. Too many people still use time() for a seed, which renders random_shuffle as useless as the randomize() example above.

Camille on December 4, 2007 2:33 PM

"I'd assume most online casinos have hired competent cryptographers and statisticians by now"

Err ... excuse me ? You "assume" that a given programming problem is solved competently by expert programmers ... and "most" ? Do you have ANY actual real life experience (I know you do, just saying) ?

J. Stoever on December 4, 2007 2:47 PM

No matter what algorithm you use a very important thing to keep in mind is to keep that implementation abstracted away. So, no matter how you do it, the programmer should do nothing more than call a shuffle function (that doesn't allude to a specific algorithm).

In Python:

import random
shuffled_deck = random.shuffle(range(52))

Cristian on December 4, 2007 2:47 PM

Fair point - if you have random numbers straight from radioactive decay then you're sorted. Most applications can't justify the hardware expense though.

Chris Johnson on December 4, 2007 2:49 PM

I was unfamiliar with the term GUID so I looked it up on wikipedia, but even there I didn't find any reason why it should be more random than a randomly generated number. For all I know, the system could just count upwards from 000....0001 for the GUID, or am I wrong?

Care to elaborate?

Winsmith on December 4, 2007 2:56 PM

> "I'd assume most online casinos have hired competent cryptographers and statisticians by now"

I assume that because there's a *lot* of money at stake for online casinos, unlike most programming jobs. And not just the "company's" money, which is somewhat disposable, but real people who get quite upset when you steal it from them.

Jeff Atwood on December 4, 2007 2:57 PM

Oh please, when has involvement of money ever made programs better ?

J. Stoever on December 4, 2007 3:01 PM

Same Ruby version I wrote at the original article:

cards.sort_by { rand }

Nice and simple.

Rik Hemsley on December 4, 2007 3:03 PM

I am surprised that many developers don't know that the pseudo random number generators that come with most programming languages - especially with the popular ones are weak. For the needs of cryptography you need to roll your own. It would be a good idea to get a good source of randomness - radioactive decay or some similar method is a good source but a bit non-practical unless you have a huge budget. Some encryption software make you type random characters or move a mouse on the screen making a human source of randomness. Again not perfect but better than nothing. This is a whole long topic on which a lot of computer science papers are based.

Tom on December 4, 2007 3:05 PM

reminds me of a project I was asked to write for a job interview. It was actually, to simulate a slot machine.

I was the only one to simulate machine behavior; everyone else just randomized.

Same sort of thing here... you're just randomizing a list/array/whatever. it doesn't simulate real world shuffling. That may be better than real world, or maybe not.

/just sayin'

josh on December 4, 2007 3:06 PM

I don't think the sorting approach is necessarily better than the swapping approach! It's more elegant but less performant (which matters for many applications).

You're still depending on a pseudo-random generator (only new it's called Guid not Random) so problem #1 is still there. Problem #2 is mitigated, but the performance is O(n log(n)) vs. O(n) with the swapping approach.

Shu Wu on December 4, 2007 3:13 PM

I wrote a program for a trading card game that allows the user to "simulate" their opening hand for the deck they created. While it isn't as critical to get it purely randomized, using the seeded random generator to grab a card from the deck and put it in a new one works well.

But the really interesting thing is that when kids (and adults) play this game in real life, they never shuffle like this. Most of the time, they like to "pile shuffle". Assuming you're using 8 "piles" you take the top card of the deck and put it in pile 1, the second in pile 2, and so on. When you go through all your cards (minimum of 40, no maximum) you then grab your piles at random and you have your deck again.

And there are other techniques too. There's your "rifle" shuffle where you split the deck in half and fan the cards together. There's the "clump" shuffle where you grab a clump of cards from the middle to back of your deck and bring them to the front.

I'm still working on making various shuffle algorithms into the program, but to some extent, the players want the deck randomized, but not too randomized, because after a round they have some of their better cards clumped together, and they want to keep it that way.

I wonder if casinos would ever use various styles in their shuffler, since most of it is not done manually anymore.

Sean Patterson on December 4, 2007 3:14 PM

Is a hand-shuffled and cut deck of actual playing cards really randomly sorted?

PJ Doland on December 4, 2007 3:28 PM

No swap in C++? What about std::swap in <algorithm>?

Puzzler183 on December 4, 2007 3:30 PM

... in <algorithm>?

Puzzler183 on December 4, 2007 3:31 PM

ping could be used as a random number, especially if you have many clients connected to the server (as online poker games probably do). Just collect the ping from all the players a few times every game, hash it to produce a standard string/byte array. The ping is "random" because it is influenced by a third party (the network, other users, other routers) which cannot be predicted.

Ihoss on December 4, 2007 3:36 PM

> Is a hand-shuffled and cut deck of actual playing cards really randomly sorted?

In my opinion, this is because physical methods can only *approximate* truly random shuffling. Computers, with clever random functions, can get much closer to true randomness than a human ever could.

Kind of ironic, really, since computers are about as deterministic as you can get, yet when properly programmed they can do much better than a human.

I'm not sure I buy the argument that we should duplicate inferior physical shuffling, because shuffling is somehow *intended* to be non-random. At least not in a typical Casino gameplay environment.

Jeff Atwood on December 4, 2007 3:39 PM

I definitely find that modeling real shuffling is more fun than actually randomizing the deck. My model for a bridge shuffle is:

1) Simulate imperfect cut
[set partition of array +/- X of center]

2) Two stacks. A force is applied on the bottom card of each stack proportional to the number of cards left in that stack (since the cards are bent, and can be treated roughly as springs). Probability comes from the proportion of these forces.

There are so many fun parameters here. Cut error, card flexibility (per-card?!), perhaps the shuffler's dominant hand influences next-card force?

Impractical, but a lot of fun :D Any other ideas for parameters?

Mike Lundy on December 4, 2007 3:41 PM

Is that algorithm reliable? It depends on the sorting algorithm to call the key function once for each element and cache the key. I don't see anything in the OrderBy docs that state that it does this. It could potentially call the key function multiple times resulting in the same element receiving different GUIDs and thus a non-deterministic end point.

Dave on December 4, 2007 3:44 PM

So far the only language I've seen with shuffling as a built-in array method is SystemVerilog (hardware programming/verification language).

engtech on December 4, 2007 3:46 PM

Thank you for not posting suck this time.

m on December 4, 2007 3:49 PM

I use my own random numbers code for my C/C++ programs, to have the same behavior from the same seed everywhere. And also to have some guaranties :D

A pseudo-random number generator with a long period (2^219937 &#8722; 1), fast (bits manipulation only), and good statistical properties is the Mersenne Twister. There is a special SIMD version of it, and it has some proof behind it. It simply rocks for simulations.

But the Mersenne Twister is also pretty easy to predict, just 634 outputs are neaded to predict it. The Blum Blum Shub generator is very hard to predict (I mean, you need a computer that not exists) while having good statistical properties. It uses product of big prime numbers so it also slow, that is not very important for a card game without AI. Blum Blum Shub, beside having a cool name, should be perfect for a poker game.

Alex on December 4, 2007 3:53 PM

josh and others: Trefethen, L N & Trefethen, L M, "How Many Shuffles to Randomise a Deck of Cards?", Proceedings of the Royal Society London, A 456, 2561 - 2568 (2000). The paper describes their model of "real shuffling."

Y'all should be careful about calling system or standard library "shuffle" routines unless you know how good the PRNG behind them is. Another good question is whether the PRNG is thread-safe (srand() / rand() isn't).

There's some interesting work out there about how to seed PRNGs from stock hardware by looking at uninitialized memory. You can also fetch "random" bits from /dev/random or /dev/urandom. Does Windows have something like this?

Mark Hoemmen on December 4, 2007 3:54 PM

Alex -- if you're using a PRNG for crypto or otherwise want unpredictable outputs, you should run it through a one-way function of some kind. The Mersenne Twister was designed specifically to improve the accuracy of Monte Carlo simulations; it's not at all meant for crypto or secure applications.

Incidentally, I've written a parallel PRNG tutorial at:

http://www.cs.berkeley.edu/~mhoemmen/cs194/Tutorials/prng.pdf

Mark Hoemmen on December 4, 2007 3:57 PM

How about some C++:

random_shuffle(cards.begin(),cards.end());

And if we want to use some serious randomness:

int rand_gen(int max, int seed=0)
{
boost::mt19937 mersenne_twister(seed);//uniform distribution
//in 623 dimensions
return mersenne_twister()%max;
}

random_shuffle(v.begin(),v.end(), rand_gen);
//or with a seed
boost::function< int (int) > f = bind(rand_gen,_1, seedval);
random_shuffle(v.begin(),v.end(), f);

Joel Eidsath on December 4, 2007 4:00 PM

I would not use GUIDs in your shuffling algorithm. Older GUID generation algorithms (such as found in VS6) actually encoded the current time as the first part of the GUID and the MAC address of the computer's network card in the last part of the GUID. The newer algorithm in Guid.NewGuid() eventually makes a Win32 call to UuidCreate, which generates a random GUID, though nothing in the docs says that it uses a non-pseudo random number generator. So I suspect that Guid.NewGuid() is no better than the pseudo random number generators that you're recommending against. If you're working in .NET, you want to use the RNGCryptoServiceProvider (http://msdn2.microsoft.com/en-us/library/system.security.cryptography.rngcryptoserviceprovider.aspx), which generates cryptographic quality random numbers.

James Kovacs on December 4, 2007 4:01 PM

@Mark Hoemmen : You should look at the wikipedia 'Blum Blum Shub' entry. It uses a well-known one way function : big prime numbers products. I cited it because of the bad security inherent to the Mersenne Twister ;)

Alex on December 4, 2007 4:04 PM

More evidence that physical shuffling is a pretty poor approximation of randomness:

http://www.flatrock.org.nz/topics/art_of_playing_cards/how_to_win_at_poker.htm
--
Two mathematicians hit the headlines in 1990 when they claimed that it took seven shuffles to make a pack of cards fully random, with no vestige of its original card order. Now comes a challenge to that claim. Five shuffles will almost do the job of randomisation, and six will make it certain, according to a paper in the Proceedings of the Royal Society*.

It all depends on how you define "random", say Nick Trefethen and Lloyd Trefethen, respectively of Oxford University, UK, and Tufts University in Massachusetts in the US.

The magic number of seven shuffles was proposed by David Bayer and Persi Diaconis in the USA, who considered how shuffling affected a quantity called the "total variation norm." This is a complicated measure of how effectively an ingenious gambler could be expected to win bets that the arrangement of cards in the deck was not completely arbitrary.

Bayer and Diaconis chose to define randomness in this way because they were interested in how shuffling might affect play in casinos. Some professional card players will go to any lengths to exploit an element of non-randomness in a card deck. Most casino dealers shuffle between two and four times - which, according to Bayer and Diaconis, was nowhere near enough to lose all trace of the starting configuration of the pack. In fact, they claimed that there is virtually no true randomisation, as they defined it, until the 5th shuffle. Only after the 7th, they said, does their measure of "orderliness" of the deck fall below half the initial value - enough to consider the pack well mixed.

Diaconis, himself a card magician, realised that astute card players could capitalise on this. He said that in his experience, shuffling seven times was almost unheard of in casinos. And he once spotted how bridge players in a New York club could anticipate the order of a deck shuffled four times between each hand.
--

Jeff Atwood on December 4, 2007 4:07 PM

First off, you could easily make a generic swap method if you wanted (I don't think there is one in .Net already; they do use make one in the documentation as an example of a generic method)

Second, you could use RNGCryptoServiceProvider to get much better random numbers. Random is made to be fast and easy; RNGCryptoServiceProvider is made to be secure.

Allied on December 4, 2007 4:14 PM

Are you showing off your lambda expressions knowledge? :P

Andrei Rinea on December 4, 2007 4:15 PM

"although I do wish there was a built in Swap command in the C# language to simplify the code a bit"

http://icr.vox.com/library/post/swapping-variables-in-c.html

[ICR] on December 4, 2007 4:21 PM

Why even bother to shuffle the array? It's probably easier and simpler to put the cards into a collection and then randomly pick one to remove based on the size of the Collection.

[code]
int pos;
String card;
List<String> d = newDeck();
while(!d.isEmpty()) {
pos = new Double(Math.random() * d.size()).intValue();
card = d.get(pos);
d.remove(pos);
}
[/code]

Mike on December 4, 2007 4:27 PM

Swap an integer, the XOR way:

ValueA ^= ValueB;
ValueB ^= ValueA;
ValueA ^= ValueB;

http://en.wikipedia.org/wiki/XOR_swap_algorithm

Shawn Poulson on December 4, 2007 4:33 PM

"Why even bother to shuffle the array? It's probably easier and simpler to put the cards into a collection and then randomly pick one to remove based on the size of the Collection."

I've used a similar approach in the past, though I seem to remember abandoning it for some reason. I can't quite remember why though.

[ICR] on December 4, 2007 4:33 PM

I'm reminded of one of Knuth's Exercises: "Look at the subroutine library of each computer installation in your organization, and replace the random number generators by good ones. Try to avoid being too shocked at what you find."

jpsa on December 4, 2007 4:35 PM

> Thank you for not posting suck this time.

No-- thank you!

Jeff Atwood on December 4, 2007 4:45 PM

@Alex -- duly noted ;-) though to the Mersenne Twister's credit, the author specifically indicated that MT was _not_ crypto-secure. The author's implementation of MT is a chunk of highly optimized C code and is likely to be faster than good crypto-secure PRNGs, though I haven't run benchmarks. MT grew out of the need for accurate Monte Carlo simulations.

Mark Hoemmen on December 4, 2007 4:46 PM

""Why even bother to shuffle the array? It's probably easier and simpler to put the cards into a collection and then randomly pick one to remove based on the size of the Collection."

I've used a similar approach in the past, though I seem to remember abandoning it for some reason. I can't quite remember why though."

It's more security conscience as well. Assuming the random number generator is good the only thing you can predict, via memory inspection, is what cards are left. As we all know people count cards in the casinos as it is, so it's not like that'll be giving anything away.

Mike on December 4, 2007 4:46 PM

"var cards = Enumerable.Range(0, 51);
var shuffledcards = cards.OrderBy(a => Guid.NewGuid());"

How does
var shuffledcards = orderby(a=>Guid.NewGuid())
work?

I wouldn't think you could sort by a guid considering its not part of int.

Brian on December 4, 2007 4:56 PM

Years ago I was tasked with creating a card game, and given instructions to make it as realistic as possible, as it would be used to simulate actual card playing. At first, I thought about a true random shuffle, then I sat at my desk with a pack of cards, and actually shuffled, and I noticed that none of my shuffling was really very random at all.

The two main methods I was using were cutting the deck, and riffle shuffles. I decided to implement these, along with a few other shuffle methods directly in the code, and then call random shuffle methods, a random number of times. For example, you might have a cut called with an offset of 30, which would yield a top pile of 22 cards, and a bottom pile of 30, then the top pile would be moved to the bottom, followed by a riffle shuffle with an offset of 25, where there were a pile of 25 cards and a pile of 27, which were then combined one after another, with some weighting to determine the likelihood of two or more cards being grouped, followed by another cut.

In the end, the algorithm gave a more natural shuffle so to speak, and was much closer to the results of a human physically shuffling the deck. Given humanities ability for complex subconscious pattern recognition, it wouldn't surprise me at all if normal nonrandom shuffling behavior affects poker betting, on and offline.

jcnnghm on December 4, 2007 5:04 PM

This reminds me of a blackjack program on the VIC20 that got much slower towards the end of the pack because it was looking for a card it had not already used.

Speeking of seeding with milliseconds, even if you have your computer clock set at the same time precisely to the millisecond, how do you know exactly when the random numbers are being generated?

Those pesky preemptive windows messages also get in the way and disrupt the timing.

Andrew on December 4, 2007 5:31 PM

There's a third problem with the first method, and that's got to do with the probability actually not equaling 1/<count> after you start choosing new ones. I can't remember exactly, but I recall doing something quite similiar in school.

TraumaPony on December 4, 2007 5:46 PM

@Mike Random removal from a mutable array is the same approach I used last time, and the same approach I'll use next time, unless someone shows me why it's wrong.

Mike Lee on December 4, 2007 6:16 PM

@Shawn Poulson
> Swap an integer, the XOR way:

Cute, but that isn't as efficient as using a temp variable. Any optimizing compiler would locate t in a CPU register, using no actual memory (if we use integers). The xor method would move both variables into registers first before being able to operate on them.

Bill on December 4, 2007 6:25 PM

"Speeking of seeding with milliseconds, even if you have your computer clock set at the same time precisely to the millisecond, how do you know exactly when the random numbers are being generated?"

If you know roughly the time the RNG was seeded and the two cards you were dealt as well as the cards of the flop the cheat program could check a range of seed times until it gets the same cards. Looking at the screenshot on the linked page it seems that is exactly what is happening.

Jeff: Thanks for the interesting post!

unthinkableMayhem on December 4, 2007 6:31 PM

In Ruby:

(1..52).sort_by { rand }

Stephen Touset on December 4, 2007 6:49 PM

True random number generators are pretty easy. You don't need nuclear decay. Actual random number generators are pretty cheap, and I bet must gambling sites have them. A DIY one is also easy. Plug a microphone into your sound card, put it next to the power supply fan, and read the sound coming in. Filter out all but the very highest frequency components, and you have a pretty strong source of randomness.

FraserOrr on December 4, 2007 7:41 PM

SELECT CardNumber FROM dbo.Cards ORDER BY NEWID()

Dustin on December 4, 2007 7:52 PM

This reminds me of how I was taught sorting algorithms in high school and a paper I wrote later. My teacher used a deck of cards to teach me the basic sorts (Shell, Bubble, Quick, Insertion etc. etc.) In the paper I wrote I included an algorithm that was the least efficient (in terms of lines processed to achieve a sorted array).

The algorithm was the equivalent of throwing a pack of cards in the air and checking if it is sorted. If not, try it again. I used the real number between 0.0 and 1.0 approach in the algorithm.

I wish I could include the algorithm and the results here but I am currently at work.

Albert on December 4, 2007 8:05 PM

What does a GUID get you that, say, generating a number based on a good /dev/rand implementation not get? GUIDs are designed to be unique globally -- across machines, etc., etc., which means that many implementations have a lot of redundancy (aka "waste") when used as random seeds on a local box.

I do need to generate a number from a field larger than 52!, but that's still just generating a number, so why not just do that?

One of my biggest beefs in software development is sandblasting the salt off a soup cracker: do what you need to do, and no more.

Robert Fischer on December 4, 2007 8:16 PM

Nothing is ever truly random, just confusing enough to make a person think it is.

Mattkins on December 4, 2007 8:16 PM

Could you discuss detail as to why generating a guid is more "random" than calling random? And, is that REALLY a one-liner? It doesn't look like you're storing the GUID's anywhere- how is it comparing against the GUID's already representing other cards elsewhere in the deck?

Also, @Shawn: Swapping values "the XOR way" sets both your ints to 0 if they have the same value:D I read an article once citing that snippet as an example of "the dangers of clever code"

Alex Lucas on December 4, 2007 8:19 PM

@ Chris Johnson
RE: There are 52! ways of arranging the cards [...] any random number generator [...] must thus have an internal state of at least 226 bits

I tend to agree, though talking about the internals of a PRNG to determine it's suitability to a specific fixed task seems, to me, to be missing a crucial point. So many programmers treat PRNGs as something to initialise once and then use to get a potentially infinite number of random bits out of. This is wrong. A PRNG has INPUTS as well as an output. You should be constantly feeding the inputs with anything even remotely random that you can lay your grubby mitts on... keystroke timings, hard-disk response times, bytes received at network layer, CPU cycle count, ring-0 context times, etc. Everything. Anything. All the time.

The Fortuna PRNG by Bruce Schneier & Niels Ferguson might have potentially 10s of thousands of bits in the entropy accumulators - from which it can generate an output stream. It is really an impressive piece of work and given enough diverse inputs rapidly achieves immunity from poisoning.

Andrew R on December 4, 2007 8:31 PM

Your comment about O(n) for Knuth Fisher-Yates shuffle algorithm is ignoring a subtle detail. Since there are n! different permutations and representing n! takes log(n!) = O(nlog(n)) bits. So sampling *uniformly* from all permutations will take at least time O(nlog(n)). Of course we can assume that n fits into a 32 bit number and we're fine, but for ginormous numbers this would become an issue.

Pall Melsted on December 4, 2007 8:42 PM

@Andrew R
RE: You should be constantly feeding the inputs with anything even remotely random that you can lay your grubby mitts on... keystroke timings, hard-disk response times, bytes received at network layer, CPU cycle count, ring-0 context times, etc. Everything. Anything. All the time.

I believe there are ways in which this can bite you. Most of us are probably ill-equipped to even measure the true randomness of any such source of entropy, for one thing. I certainly am.

There are -- as you point out -- expert-designed schemes for entropy distillation and accumulation. When true randomness is important, perhaps we should use those systems, and not just "roll our own."

Western Infidels on December 4, 2007 9:55 PM

Jeff, I actually found your initial method to be "weird" - so to speak. I've always used the "assign a random id and sort" method myself, can't even remember if I saw it online or came up with it; but even if it weren't for the big-O, it's still the "logical" way of doing it.

Correct me if I'm wrong here: When you replace each card with a randomly selected card, you're going to swap some cards around twice and some cards around once; because as you move along the deck, you can end up swapping each card with one either above or below it. Then when you get to _that_ card, you swap it again, and so on and so forth.

This is closest to the behavior of a "manual" shuffle.

But when you assign a random integer to each and sort by it, you're guaranteeing that all cards have the same probability of being shuffled.

Both end-results are as "random" as each other, but the second method gets you a more "equal" randomness - which is something that counts in cryptography.

Computer Guru on December 4, 2007 10:32 PM

Although you've correctly identified some of the things a secure card shuffling algorithm needs to do, you've missed a lot of others. Chris Johnson brought up one issue in his comment above, and you can read some more at http://en.wikipedia.org/wiki/Shuffling#Implementation_of_shuffling_algorithms

Hey, where are the test results? You claim that your algorithm is a "secure, unbiased" shuffle, but is it? Is it even better than the built-in RNG? Please run your shuffling algorithm through a test battery like NIST or TestU01 and let us know how that goes.

http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html
http://www.iro.umontreal.ca/~simardr/testu01/tu01.html

[Also: When n=52, O(n) notation doesn't mean squat. For small n, an O(n log n) algorithm can easily beat an O(n) algorithm depending on the constants involved. O(n) analysis generally assumes that memory access is O(1) and arithmetic operations are all equally fast, but we all know neither of those assumptions is true. At modern CPU speeds, memory access patterns and CPU instruction scheduling dominate the cost of many algorithms -- especially RNGs.]

Michael B on December 4, 2007 10:47 PM

You should see what pokerstars.com writes on its website about its shuffle. For the random number generator, it supposedly uses, among other things, mouse movements of the users!

Greg

Greg Magarshak on December 4, 2007 11:07 PM

Funnily enough, I wrote a quick implementation of Knuth’s shuffle in Haskell a couple of days ago; http://hpaste.org/4216#a1.

Porges on December 4, 2007 11:12 PM

All right, everyone who didn't mention TAOCP Vol-2, hand in your geek cards right now, you're all banished for the next 12 months.

For the newbies out there that are scratching their heads asking "what the fuck is this TAOCP thingie, just google it."

smallstepforman on December 4, 2007 11:35 PM

My 1st game was a Tetris clone. I made it in Turbo Pascal 3.0!
I could not find a random() function so i made my own by using a pattern of digits returned from logarithmic functions.

(*) If you want secure pseudo random numbers I think its a really BAD idea to use known methods.

Nikos on December 4, 2007 11:44 PM

>How would you write code to shuffle a deck of cards?

By going to a book of algorithms and looking up the optimal solution.

(I assume the question is a variant of the proposition that the best tool for repairing a TV is a TV repairman).

Dave on December 5, 2007 12:15 AM

My first impulse was to randomise the cards as I was filling the array/list (populating a parallel set to check for dupes), not actually shuffling an existing deck. This, and the "pick a random card from the collection", method does not come from the idea of a pre-existing ordered deck and I think that's a good thing (because there is no "order" to be unexpectedly exposed should the randomness be less than expected).

Jonas on December 5, 2007 12:26 AM

Here's a clip from the PokerStars website (they did their homework):

[quote]

SHUFFLE

"Anyone who considers arithmetic methods of producing random digits is, of course, in a state of sin." - John von Neumann, 1951

We understand that a use of a fair and unpredictable shuffle algorithm is critical to our software. To ensure this and avoid major problems described in [2], we are using two independent sources of truly random data:

* user input, including summary of mouse movements and events timing, collected from client software
* true hardware random number generator developed by Intel [3], which uses thermal noise as an entropy source

Each of these sources itself generates enough entropy to ensure a fair and unpredictable shuffle.
Shuffle Highlights:

* A deck of 52 cards can be shuffled in 52! ways. 52! is about 2^225 (to be precise, 80,658,175,170,943,878,571,660,636,856,404,000,000,000,000,000 ways). We use 249 random bits from both entropy sources (user input and thermal noise) to achieve an even and unpredictable statistical distribution.
* Furthermore, we apply conservative rules to enforce the required degree of randomness; for instance, if user input does not generate required amount of entropy, we do not start the next hand until we obtain the required amount of entropy from Intel RNG.
* We use the SHA-1 cryptographic hash algorithm to mix the entropy gathered from both sources to provide an extra level of security
* We also maintain a SHA-1-based pseudo-random generator to provide even more security and protection from user data attacks
* To convert random bit stream to random numbers within a required range without bias, we use a simple and reliable algorithm. For example, if we need a random number in the range 0-25:
o we take 5 random bits and convert them to a random number 0-31
o if this number is greater than 25 we just discard all 5 bits and repeat the process
* This method is not affected by biases related to modulus operation for generation of random numbers that are not 2n, n = 1,2,..
* To perform an actual shuffle, we use another simple and reliable algorithm:
o first we draw a random card from the original deck (1 of 52) and place it in a new deck - now original deck contains 51 cards and the new deck contains 1 card
o then we draw another random card from the original deck (1 of 51) and place it on top of the new deck - now original deck contains 50 cards and the new deck contains 2 cards
o we repeat the process until all cards have moved from the original deck to the new deck
* This algorithm does not suffer from "Bad Distribution Of Shuffles" described in [2]

PokerStars shuffle verified by Cigital and BMM International

PokerStars submitted extensive information about the PokerStars random number generator (RNG) to two independent organizations. We asked these two trusted resources to perform an in-depth analysis of the randomness of the output of the RNG, and its implementation in the shuffling of the cards on PokerStars.

Both independent companies were given full access to the source code and confirmed the randomness and security of our shuffle. Visit Online Poker Random Number Generator for more details.

[2] "How We Learned to Cheat at Online Poker: A Study in Software Security" - http://itmanagement.earthweb.com/entdev/article.php/616221
[3] "The Intel Random Number Generator" - http://www.cryptography.com/resources/whitepapers/IntelRNG.pdf"

Mike on December 5, 2007 12:41 AM

You should see what pokerstars.com writes on its website about its shuffle. For the random number generator, it supposedly uses, among other things, mouse movements of the users!

Greg

Greg Magarshak on December 5, 2007 12:50 AM

Ok, so PokerStars has done it's homework. Any more good examples?

Good to know that the shuffles are properly done when you feel like playing Poker.

djchester on December 5, 2007 1:31 AM

Jeff, I'm disappointed. There are two things I completely disagree with here.

Calling up a new GUID and using it to shuffle is too simplistic and abstracted. You're asking for someone to come along and spend a some time reverse engineering their way back to a point where they can find a shuffle through a rainbow-like attack. Where's the 'random' in GUID creation? How do you know it's random enough? GUID's aren't always perfect, and it could be possible to track them down if you were to use a static function like that. This is a problem where even 'theoretically possible' is not good enough. Ergo a custom built algorithm with a phenomenal amount of randomness is the only worthwhile and fair solution.

The more important thing you forgot is that as soon as you put money on a table, it belongs to the casino. A better encryption or shuffling algorithm is only advertising and customer retention. The goal of any casino is to pull as much money as possible with the odds on its side onto the table, (or pull out a percentage) then keep the players playing as long as possible. There is inefficiency in the system because the system is designed by those creating the inefficiency.

A casino programmer's job is to bring in as many players as possible and make standard distribution at least APPEAR to apply so you can continue siphoning money from more and more people... I mentioned fair solutions, but it may not be in the casino's best interest to have a fair solution all the time.

Dylan Brams on December 5, 2007 1:47 AM

I do not think that shuffling is an "easy solved problem", especially since the shuffle algorithm you describe is very easy to get wrong. In fact, almost all swap-based algorithms are wrong.

You already made a mistake in your textual description: "So we loop through the deck, switching each card with another card from a random position in the deck". That is wrong (doing just that would produce a biased shuffle) and it is also not what you are doing: you are not switching with any random card, but only with a random card that hasn't been swapped yet!

The subtle difference makes the difference between a biased and an unbiased algorithm (assuming the RNG is fair). Your textual description does not match your code, which is why the code is correct while the description is wrong.

A better way to describe the algorithm that you actually implemented (that is probably easier to understand and analyze as well) would be:
- At the start the shuffled deck is empty, and all (unshuffled) cards are in a pile.
- Remove a card from the pile at random, and add it at the end of the deck.
- Repeat until the pile is empty.
The deck is now shuffled without bias.

The reason you use swaps at all, is just to simulate the behaviour describe above without having to allocate a separate data structure. In your for loop elements at index 0 through i represent the (unshuffled) pile, and elements at index i through (cards.Length) are the deck (you add cards in the front, in this implementation).

Maks Verver on December 5, 2007 1:55 AM

To make matters worse, your code for the shuffling-by-sorting is wrong. A sorting algorithm is only guaranteed to do anything meaningful if you call it with a partial ordering of the data. If you generate a different random key for the same element every time it is evaluated, the keys do not describe a partial ordering.

It probably works in practice because of the way the sorting algorithm works behind the scenes, but the correctness does not follow from the code. Some algorithms may not even terminate (or more correctly have a low probability of terminating) if you do not define a consistent ordering.

(DISCLAIMER: I don't really know the language used (VB.NET?) so it may be that additional guarantees are given by the language or library that I may have missed.)

Maks Verver on December 5, 2007 2:10 AM

What about employing several shuffling methods and alternate between them "randomly"?

Lahur on December 5, 2007 2:46 AM

I'd just use the JDK function.

http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collections.html#shuffle(java.util.List)

Funny that in my post about job interviews and reusing JDK functions to reverse a String one poster called me arrogant, because I thought most people when ask to write a String reverser would implement it instead of reuse it.

http://stephan.reposita.org/archives/2007/11/12/job-interviews-for-developers-attention-to-detail/

Thanks for proofing my point.

Stephan Schmidt on December 5, 2007 4:03 AM

> SELECT CardNumber FROM dbo.Cards ORDER BY NEWID()

Dear god... No!!

Shannon on December 5, 2007 4:58 AM

Has Catto given up reading Coding Horror?

Mat on December 5, 2007 5:10 AM

Way back circa 1989 I implemented a blackjack program using the sorting algorithm you suggested. The interesting thing was that the "platform" was a spreadsheet -- the whole thing was implemented in spreadsheet macros, and it was a complete implementation of the game, with insurance, splitting, etc.

The sort was done by having a named range of cards with 2 columns: the card itself, and the forumla RAND(). To shuffle I just told the spreadsheet to sort the range. I was pretty proud of that little thing.

At the time, I was in college and working part time as a tester of an office application suite. The guys in my department each implemented a different casino game as a means of pushing the tests as far as they'd go.

One guy did craps, and set it up to play automatically all night. The next morning we came in and it was rich, winning a bazillion dollars. We figured "that can't be right -- over time, you have to lose", so we ran it again the next night, and got the same result. It turns out that he'd uncovered a flaw in the spreadsheet's random number generator. Through some quirk, the dice rolls came up 11 (and 12) much more frequently than they should have.

Chris Wuestefeld on December 5, 2007 5:40 AM

Thank you, this post very clearly demonstrates the need to do quality work when developing software. A lot of times we take shortcuts and somewhere down the line, chaos happens. And thank you Mike for that excerpt from the PokerStars web site.

Sameer Alibhai on December 5, 2007 5:40 AM

"@Andrew R RE: You should be constantly feeding the inputs with anything even remotely random that you can lay your grubby mitts on... keystroke timings, hard-disk response times, bytes received at network layer, CPU cycle count, ring-0 context times, etc. Everything. Anything. All the time."

I recently stumbled across a Java RNG that sought to "improve" on the one in the library by feeding the result back as the seed. When working on numbers in fairly small ranges it converged on the stable patterns of repeating numbers very quickly. At 5 digits it only produced a handful of results over and over. The requirement in this app was "few duplicates" regardless of any patterns or predictability. I just took out the "improvements."

How do you factor in those feedback sources to avoid this problem?

Jim on December 5, 2007 6:18 AM

Interesting. Your physical definition of shuffle differs from mine. You're using the computer science notion of shuffle, not what actually occurs when one shuffles a deck of cards in the standard way.

As part of a course on counting theory, I built a card shuffling algorithm that models the way that humans shuffle a deck.

That algorithm, roughly, is:
1. Divide the deck into two stacks.
2. Pop the top item from the first stack into a result stack.
3. Pop the top item from the second stack into the result stack.
4. Repeat steps 2-3 until both stacks are exhausted, and the result stack contains all 52 cards.

What's interesting about this algorithm is that the results appear random by human observers, but are quite deterministic. If you shuffle the stack perfectly either 8 or 26 times (depending on implementation of the algorithm), the deck returns to the original order!

This is the algorithm used by most card sharks and all automatic card shufflers in Reno and Vegas, even if they don't call it an algorithm.

Then as part of this college course, I looked at the entropy introduced by non-perfect shuffling. Which was pretty interesting. It came in handy when working in Reno, although I wasn't legally allowed to bet anything.

By the way, thanks for the link to Cigital's case study! Contact me if you have any questions, or want some code for the above algorithm:
bwalther@cigital.com

Ben Walther on December 5, 2007 6:38 AM

By the way, Cristian's Python code is wrong:

He said:
import random
shuffled_deck = random.shuffle(range(52))

it should be:
import random
deck = range(52)
random.shuffle(deck,random.random)

The shuffle sorts it in place, and returns NULL. The second variable is not optional; it is what specifies which random number generator to use. If none is specified, the deck is not randomized at all.

(in Python 2.5 at least)

Pythonista on December 5, 2007 6:46 AM

A possible source of randomness that is available to everyone is environmental RF noise. This can be sampled off of the line input from the sound card. Best if it is hooked up to a radio between AM stations, but even with nothing plugged into it, there is usually a little noise. In an actual casino, a microphone may be a good source as well. What is needed is a real world (as opposed to computed) source because the real world is not predictable. Sample this several times, and combine the results into a seed.

Not quite radioactive decay, but pretty good for the price.

Grant Johnson on December 5, 2007 7:11 AM

I was about to post that someone should build a web service that provided realistic random numbers, but with a bit of quick googling it appears that it exists already. And as another commenter suggested, it uses environmental noise. Cool stuff.

http://www.random.org/

JohnFx on December 5, 2007 7:34 AM

The sorting shuffle is biased! Here's why:

Sometimes, the random number generator will assign the same number to two different cards. (Because of the Birthday Paradox, this can happen *much* more often than you might expect.) The order of cards assigned the same number is then just a property of the sorting algorithm. For instance, if you're using a stable sort, your shuffler will have a slight order-preserving bias.

Whether this bias is large enough to worrying about depends on the context, but it can be hard work to prove that the sorting shuffle is good enough for your purposes. So why not just use the Knuth shuffle instead? It's really not so hard; I can describe it in one sentence: "To shuffle n cards, shuffle the first n-1 cards, and then swap the nth card with the card at a randomly chosen location."

Peter on December 5, 2007 8:59 AM

Any algorithum is going to be predictable where are "random" numbers should be unpredictable. So there is a problem right there.

Sampling noise and mouse click helps makes the output more unpredictable, thus the resulting number should be unpredictable.

Of course there are many ways to shuffle cards. Throw all cards on table and move them around with your hands. Stack, cut and deal. You could also try to mimic that behavior and consider the deck shuffled.

Jon Raynor on December 5, 2007 9:02 AM

There is also probabilities to be considered. If decks are shuffled, probability says 1/52 chance of first card being an Ace of Spades. Now you may shuffle a thousand times and not get an Ace of Spades as the first card, but with a large amount of attempts (and I do mean large), the percentage should converge to 1/52. Otherwise, someone may think your shuffle is rigged.

So not only does you shuffle have to be random, but its results should converge to known probabilities over time.

Jon Raynor on December 5, 2007 9:09 AM

There is no random, there is only unpredictable.

brian on December 5, 2007 9:32 AM

Two thoughts that are being addressed here:

1) If you are doing anything you care about, research the random number generator (PRNG) you are using. The best ones will have passed a number of theoretical tests to deterimine that they indeed have no pattern. Fun to note that one of the quick litmus tests is to generate a large number of 5-card poker hands and see how the result aligns to computed probabilities.

2) Using noise and other environmental sources is tempting. However, I think this creates a maintenance burden to ensure that the quality of the random number stream is high and stays high!

Jason B on December 5, 2007 9:35 AM

I'm working on a gambling program, and made the decision to collect random #s from www.random.org. This site makes the following statement: RANDOM.ORG offers true random numbers to anyone on the Internet.

It's a different approach - I created a data file of 10k #s, and run through them in a circular fashion each time I need a new #.

Gene Sewell on December 5, 2007 10:23 AM

The guid approach sounds approximately as bad as the original random number generation approach. What they really need to do is focus on randomizing the seed value, and I really don't think that would take much for this purpose.

I have a few ideas for one. Random typing by a trained monkey would be my second choice.

Jess Sightler on December 5, 2007 11:05 AM

* As others have noted, guids are not guaranteed to be a good source of randomness. Though most implementations are in fact crypto-strength random, it would be wise to call a method guaranteed to return crypto-strength randomness.

* The problem with the swap algorithm is not that it is too complex. It is perfectly straightforward. The problem is that some of the n! possible outputs are more likely than others.

Eric "purple guy" Lippert on December 5, 2007 11:47 AM

I'd probably use a few different "shuffling" techniques back to back – If possible, I'd change up the order in which the different techniques where applied.

It wouldn't be as efficient as some people's suggestions, but I think it'd produce a fairly random shuffle; at least as far as users would notice.

Frank on December 5, 2007 12:50 PM

I don't think anyone has mentioned HotBits yet (http://www.fourmilab.ch/hotbits/) -- random bits made to order, generated from radioactive decay. You can build your own as well, instructions and code is on the site (the inline assembly brings back fond memories ;-)

bjorn on December 5, 2007 4:04 PM

Peter: That is exactly why I suggested using a random real number between 0 and 1.

If the randomness source is crypto-strength random, and you generate batches of 52 double-precision sort numbers then you should expect there to be collisions in less than one out of every twenty trillion decks generated. (The math justifying this claim is left as an exercise.)

Crypto-strength guids have a much _smaller_ collision rate than doubles. I haven't worked out the math exactly, but order of magnitude, it'd be around one out of every trillion trillion decks.

In short, this is not a significant source of bias.

Eric "purple guy" Lippert on December 5, 2007 4:35 PM

> The problem with the swap algorithm is not that it is too complex. It is perfectly straightforward. The problem is that some of the n! possible outputs are more likely than others.

Eric, I read your original description of this, and I even went over it with another programmer here. We *still* don't understand how the Knuth shuffling algorithm can have any bias, assuming the random number generator is crypto-strength.

Jeff Atwood on December 5, 2007 5:36 PM

@ Western Infidels
"... ill-equipped to even measure the true randomness of any such source of entropy... expert-designed schemes for entropy distillation and accumulation.... not just "roll our own.""

I agree. Absolutely. In fact, I'm pretty sure that is precisely what I said.

@ Jim
"I recently stumbled across a Java RNG that sought to "improve" on the one in the library... I just took out the "improvements."

Good for you. I hope that worked out for you, though I conjecture that both versions are rubbish.

Did either of you realise what you were actually suggesting in your following statements abot entropy sources?
Let's review...

ME: you should use absolutely any even source of even remotely/potentially random noise as input to your PRNG
YOU: how do you purpose to judge the value of an entropy source?

So...
You're suggesting that a PRNG have NO source of entropy?
Perhaps you're suggesting that somehow the PRNG has a magic internal yet-software-based source of entropy?

Entropy has to come from somewhere and it sure ain't embedded in the PRNG itself. Meanwhile, I can't judge the value of any source. But I *can* provide a crapload *of* them, which will result in the largest possible yield of entropy. So I'll do that and trust that the person who wrote the PRNG is pretty smart and knows to push their entropy sources into block-cipher/hash algorithms like SHA1 and mix it all up etc.

I suggest you both read up on the Fortuna PRNG.

I repeat: it is an impressive piece of work - I cannot do it justice here.

Andrew R on December 5, 2007 7:20 PM

As others have said, the problem with Random is often the seed. .NET's random number generator is pretty decent if you use an unpredictable seed:
Random rand = new System.Random(Guid.NewGuid().GetHashCode());

For something like online gambling where significant money's involved, the crypto RNG would be better. In that case, you'll need to give up on a one-liner (unless it's one unreadable line), since the RNG returns an array of bytes.

http://msdn2.microsoft.com/en-us/library/system.security.cryptography.rngcryptoserviceprovider.aspx

Jon Galloway on December 5, 2007 10:17 PM

Jeff: Julian's Array Swap algorithm is not the same as Knuth's. The difference is in the line:

R = Random number from 1..52

Knuth's version would be:

R = Random number from 1..X

Eric's (really neat!) proof of bias only applies in the first case.

Eric: I was about to flame you, but then it occurred to me that you're a smart guy who earns more than I'm ever likely to, so maybe -- just maybe -- I could learn something.

So why do we differ?

You like Sorting Shuffle because it's simpler and easier to explain.

I like Knuth's Shuffle because it's more elegant and easier to prove correct.

When you're managing programmers, Sorting Shuffle wins every time. Having said that, the Birthday Paradox still leaves me feeling a little uneasy....

Okay. That's enough hair-splitting from me.

Peter on December 5, 2007 10:25 PM

Mr. Sean Patterson has an interesting quote:
Assuming you're using 8 "piles" you take the top card of the deck and put it in pile 1, the second in pile 2, and so on. When you go through all your cards (minimum of 40, no maximum) you then grab your piles at random and you have your deck again.

Which comes directly from one of the many Collectible Card Games (in this case I would guess Magic the Gathering, limited play has 40-card decks). It's neat that he's mentioned this shuffling method b/c a game like Magic hinges on a balanced distribution of a couple of classes of cards. You'll have a deck comprised of 3 classes of cards in a 17-16-8 distribution and then you'll start with 7 off of the top. For the game to really flow you'll need 2-4 cards from the first class (17) 2-4 cards from the second class (16) and 1-2 cards from the last class (8).

Normally, this works, but every once in a while the hand is "no good" and you have to take a mulligan. To avoid this, rookie & casual players will pick up the played cards (about half of the deck), pile them by class and then "weave" them together. They then do a quick shuffle: a couple of riffles and overhand cut or two. This type of shuffling generally cuts down on the mulligans because the cards are either nicely ordered or clumped in small groups (still acceptable).

Of course, the method described by Sean is actually considered cheating at the pro level (yes there is a pro level). "The weave" is something like an automatic disqualification and pro players don't even stack their cards by type, they just pick up everything between games and shuffle away. The shuffling ritual can suck up 5 minutes of a 50 minute round, but hey it's part of the game.

Of course, there's also Magic: Online which provides a digital version of the game. Some class of online players have personified and villified "The Shuffler" as if some machine construct were out to get them.

In reality, the real-life pros also dominate the on-line game, despite all of this randomness. Which is likely a sign that they're shuffling pretty well :)

Gates VP on December 5, 2007 10:56 PM

When I first saw this example of shuffling, I had no idea about syncing the random seed with another could lead to such a security hazard. But I don't believe using a GUID is a perfect way to generate a random number, unless you know that it's algorithm produces an equally distributed random number--and that is determined by the GUID algorithm's "variant and the version" (http://www.famkruithof.net/guid-uuid-make.html).

So either be careful when choosing the API your GUID is generated from, seed your random generator with a unique number, or randomly sort the GUID's

[If you don't know about hash tables, find a tutorial about the various ways you can make them. Very interesting if you enjoy data structures and algorithms]

My idea is to use a hash table to sort the cards into by each cards generated GUID. By using a randomly generated number in the hash tables algorithm, this guarentees evenly distributed shuffling as well as essentially O(n)--which is only interesting when you come across much a much larger amount of cards (i.e. encryption and such).

In this way, if you use an API which is inflexable (doesn't let you seed your random number generator or choose your GUID algorithm), you can still adapt an equally distributed GUID method.

James on December 6, 2007 6:04 AM

If you need real random numbers ie your business depends on it. You need
to decaying radioactive material.

Example of that is below:

http://www.fourmilab.ch/hotbits/

http://www.fourmilab.ch/hotbits/hardware3.html

nuff said.

Scott Wickham on December 6, 2007 8:14 AM

My apologies, indeed, I conflated Knuth's algorithm (which has no bias) with the algorithm on Mike's blog (which does have bias). Knuth's algorithm has n! possible paths, each to a distinct output, each equally likely. Julian's algorithm has n^n possible paths to n! distinct outputs, and since n! does not usually divide n^n evenly, some of those have to be overrepresented.

Eric Lippert on December 6, 2007 8:24 AM

All the talk about random number generators aside, I think we've lost sight of something here: How physical shuffling is done. The "perfect" shuffle algorithm actually shuffles like a casino dealer would, not in some pseudo perfect mathematical sense. Standard gaming rules dictate 7 shuffles in a standard 2 halves fanned back into 1 style with a cut at the end. This means that any card starting at the dead bottom of the deck has no way of making it all the way to the top of the deck, and vice versa prior to the cut. In short, the shuffle is not a true random reordering by any stretch.

Anyway, here's my take on how I'd do this:

1) Start with 52 cards in an array.
2) Split the deck at a "random" half somewhere between the 1/3 and 2/3 marks with weighting towards the middle.
3) Walk the index alternating between your start position and your half position reading 1-5 cards at a time at random (see 3a)
3a) The random algorithm used would need to have 2 weights included. first, a base weight towards 2 or 3 for realism. Second, a calculated weight based on the half of the decks relative remaining size as compared to the other halfs remaining size. This "corrects" instances of one side of the deck running out of cards significantly faster than the other, as any physical shuffler would also do.
4) Repeat steps 2-3 for a total of 7 iterations
5) Allow the player a way to "cut" the deck, or calculate a "random cut anywhere in the deck, weighted towards center.

Not in code because, honestly, I don't have time right now. Maybe I'll do it on the quick and dirty via Ruby. I think this is closer to reality of a shuffle, though.

vince on December 6, 2007 8:44 AM

Sorry to ask a silly question in the midst of such heady discussion, but I just can't figure out the snippet:

var cards = Enumerable.Range(0, 51);
var shuffledcards = cards.OrderBy(a => Guid.NewGuid());

- What the heck is a?
- Where is/are the other params to OrderBy (docs say this method needs at least 2)
- In the lambda expression, how can a be a param to NewGuid which doesn't take any?

Card carrying 80% member on December 6, 2007 9:31 AM

Re: The first problem

Several people have mentioned using implicit user input data as the actual source of random numbers (mouse coordiantes, key press distributions, etc). I think that in many cases, however, this is still vulnerable to reverse-engineering by a resourceful hacker. Plus your range of possible values is limited by the granularity of the implicit user input data.

Instead I suggest to just pull and discard numbers from your PRNG in a core loop somewhere in your code. Maybe a message-processing loop, or when waiting on user input. Just make sure it's somewhere that gets run on a fairly constant basis anytime the app is running. This is a simple way to ensure there are no "synch points" in your app that a hacker can count on to line up their PRNG with your app's. Because you're constantly generating numbers and there's no way to know how many will be generated between the last time you actually used one and the next time you will actually use one.

WaterBreath on December 6, 2007 10:40 AM

Re: Bias
At first I thought your algorithm had a bias because (to paraphrse Peter) you're choosing a card to swap with in the range [1..X] instead of using the the range [1..52]

wikipedia on Knuth's Shuffle (http://en.wikipedia.org/wiki/Knuth_shuffle) gives a good description as to why this _isn't_ the case. In brief, swaping with a card in the range [1..52] gives 52^52 possible swap sequences, but there's only 52! possible decks. 52^52 > 52! so there are multiple sequences leading to a particular deck. 52^52 is not divisible by 52! so some decks have more sequences leading to them then others.

Thanks for a great article!

Mike on December 6, 2007 11:18 AM

I know this is a late comment, but I have something to add to this. I don't have experience with deck shuffling algorithms, however I can't really see much problem here. Even though random and psuedo random coding is sketchy at best, when it comes to deck shuffling, it shouldn't be difficult to figure out how to randomize the deck.

Take real world shuffling with the hands, you roughly cut the deck in half, then attempt to merge them so that the deck is in staggered as perfectly as possible. However in real life it doesn't happen with perfection, and therefore there are always a few to several cards that are "misplaced" in the pattern. repeat until one is "comfortable" with the randomized nature of the deck.

One could simply emulate this with a few random routines. Literally randomly deciding which is the next card to merge into the new deck from the split decks.

If one loops the deck through something like this, the results should be staggeringly more random than initially suspected.

dim deck
for i = 0 to 12

main(

Scot McPherson on December 6, 2007 12:26 PM

I would like to have a preview first, I hit the post button accidentally while writing thise. Though I suppose I should have written in a text editor first.

I don't think you need the rest to understand what I mean though.

Scot McPherson on December 6, 2007 12:27 PM

Catto must be sick, how long has it been since he missed a post?

Craig on December 6, 2007 12:41 PM

For those asking about why the built-in Rand() function and Timer seeds are less than optimal.

Rand() uses the generated Random number as the Seed to the next number. So even if you used the Timer to start the initial seed you can predict every number after the first randomly generated one as it is a sequnece that can be repeated every single time.

So to try to clarify, if the Random number generated using Rand() and a Timer seed was 52, then 52 is used to seed the next number and you will get the same value every time, in fact RandomNumber2 through Random NumberInfinity is now predictable.

All that using the Timer for a seed does is scramble the first possible number.

Rand() is classified as a Linear Congruential function.

A good random number generator has three properties:
1. It generates evenly distributed numbers
2. The values are unpredictable
3. It has a long and complete cycle (it generates a large number of different values and all the values in the cycle can be generated).

Linear congruential functions meet the first property but fail the second property miserably! In other words, rand() produces an even distribution of numbers, but each next number is highly predictable!

Let’s look at the source code to a typical rand function:

unsigned long int next =1;
int rand(void)
{
next = next * 1103515245 + 12345;
return (next/65536) % 32768;
}

DotNet has a much better random number generator called RNGCryptoProvider.

I appreciate the elegance of Jeff's solution in that it reduces the required code to just several lines. I have used this exact same problem in my programming classes and have seen solutions with 100+ lines of code over the years.

The other advantage to using random numbers and sorting is that it scales very well to very large number sets.

David E. on December 6, 2007 1:00 PM

@WaterBreath
"Instead I suggest to just pull and discard numbers from your PRNG"

How, pray tell, do you intend to make your "pull and discard" algorithm non-deterministic?

Perhaps you can use a random timing?
But, oops, that requires a random number, which requires a non-deterministic PRNG... in other words, you need a non-deterministic PRNG to build your non-deterministic PRNG...

Andrew R on December 6, 2007 5:52 PM

Eh Jeff, you disapoint me!
Do you really need to show algo-complexity when N=52? Let asume you do not do a N^8 algo, and it OK with that.

Second point is: Random() sucks for cryptography but it is enough for human's game. Assuming no one can know the initial generator number, rand.Next() is ok.

Tell me if i wrong, but i belive if i give you this algo:
Random rand = new Random(x);
while(true)
{
Console.WriteLine(rand.Next(100)%2 == 0);
}

And i give you the 50000 first results:
true
false
false
false
false
true
true
...
Do you will be able to retrieve the "x" number?
I think the answer is no.

In this case the only problems for a casino is to find a good "x" that now one can predict. I have no idea of how to.

It seem to me that all of your sort carts algo are OK. So the best one is the easier to understand.


Olivier de Rivoyre on December 7, 2007 9:35 AM

Thinking about my comment I made before, and using a complicated hash table (which is very interesting to learn) to sort your data into, and then take it out in sequential order is a bad way to do it. Of course it was the only thing I could think of on my 15 minute break at work.

Using your GUID idea, just adding the randomly generated number to the GUID would guarentee a "good enough" random dispersion with a unique number.

#If random number is a long integer Then
randomIndexOfCard = random() + (getGUID() mod [larges random number possible]);
#Else
randomIndexOfCard = random() * sizeof(long int) + getGUID();
#Endif

Loop until each card indexed and sort the cards as a shuffle.

Then all you have to do is shuffle, rince, and repeat! (7 times sounded like a good number :)...

James on December 7, 2007 3:20 PM

Mmm, Java has a shuffle functin, I had no idea.

Wrote my own shuffle mechanism which is exactly the same as the one posted...

I guess it does prove that it's easy ;) Although I was pleased that it went in O(n).

Carra on December 8, 2007 3:16 AM

> How, pray tell, do you intend to make your "pull and discard" algorithm non-deterministic?

Maybe I am being naive here... But if so could someone explain what about my proposal is any less deterministic than what others have proposed?

How could someone predict how many numbers will be discarded unused while, for example, the app is waiting for the user to click a button? The actual numbers used for the randomness-critical app features will not be consecutive numbers generated by the PRNG. How far we jump forward in the numbers we actually use depends on how many cycles of processor time are actually spent in that loop, discarding numbers. I'm sure this isn't completely unobtainable info, but I don't think it is any easier than, say, hooking into the mouse location data.

I'd guess this has a similar effect to reseeding your generator often. But, not knowing a whole heck of a lot about the math behind PRNGs... maybe you're saying this somehow makes the numbers less random?

Waterbreath on December 8, 2007 9:27 AM

> Perhaps you can use a random timing?
> But, oops, that requires a random number, which requires a non-deterministic PRNG... in other words, you need a non-deterministic PRNG to build your non-deterministic PRNG...

If I'd read the comment a little more carefully, I would have quoted his question in my last comment, because that is what I am answering.

Again, I'm not saying it's perfect. It can be hacked. I'm just saying that it's no worse than other people's more complex solutions.

WaterBreath on December 8, 2007 9:32 AM

For all you old-timers out there, heres some code you can run on your VIC 20 to shuffle those cards...

...
5 cntr%=1
7 REM ** cntr% holds the order in which cards are drawn
10 DIM a(52)
13 REM ** 52 cards in a standard deck
15 REM ** initialise them to 0
17 FOR c=1 TO 52:a(c)=0: NEXT c
19 REM ** pick a card
20 i%= INT( RND(1)*52)+1
25 REM ** look to see if this item can be inserted
30 IF a(i%) <> 0 GOTO 20
35 a(i%)=cntr%:cntr%=cntr%+1
37 REM ** if we haven't picked 52 cards, go again
40 IF cntr% < 53 GOTO 20
45 REM **
47 REM ** display results
49 REM **
50 FOR z=1 TO 52: PRINT a(z): NEXT z
...

Not so much Try and catch, more like pray and hope ;o)

Steve on December 9, 2007 4:38 PM

@Waterbreath

> ... could someone explain what about my proposal is any less deterministic than what others have proposed?

Determinism is all I was pointing out. Anything that is deterministic makes up no part of the randomness in the result.
eg. Giving a PRNG it's own output as a new input seed does nothing for the randomness (though some generators still do this for security reasons). Ie. anything that you know to be deterministic may as well be removed.

> How could someone predict how many numbers will be discarded unused while, for example, the app is waiting for the user to click a button?

Phrase the question slightly differently: where is the true source of randomness?
If you said "the timing of the mouse click", you'd be right. Thus, it is the only thing an attacker needs to control, eg. by stuffing a mouse-click event into the queue by some means, and voila, complete compromise; the attacker knows the exact state of the PRNG. Bugger.

Conclusion: pulling numbers from the PRNG did nothing to increase the randomness, you were actually relying solely on the mouse click for all your entropy.
In other words: you may as well not bother with pulling numbers because it's deterministic. The mouse-click was good, its (one of) your source(s) of randomness; when it occurs you can roll the timing into the PRNGs input. It will contribute whatever tiny entropy it might have (possibly none if the attacker has corrupted it - but that shouldn't matter because you have many other sources... right?).

> I'd guess this has a similar effect to reseeding your generator often...

Ah, I see the problem. You are accustomed to the cute little things referred to as "RNGs" in many standard libraries; you seed them once and then use them forever. Those are a poor shadow of a PRNG - you basically just shouldn't ever use them. They are, often, trivially easy to brute-force attack.

A real PRNG has inputs as well as an output. "seeding" as you know it, should be occurring frequently in an unending stream, never ceasing, from many sources.

So... what are the sources?

Anything.

If you aren't certain of the value of a source (and you shouldn't be, because who can accurately judge the entropy in a mouse-click?), stuff it in anyway. A real PRNG is unaffected by a compromised entropy source because its output is the ENTROPY SUM of it's inputs. Thus, if one source is compromised (or simply deterministic from the outset), it is merely a source with zero entropy; it is not detrimental in any other way (so long as the remaining sources sum to the level of entropy required).

Andrew R on December 9, 2007 7:34 PM

"The standard way of implementing this algorithm is: associate each card with a random real number between 0.0 and 1.0. Sort the list based on its associated number. That's O(n log n) and has no bias."

Couldn't you use integers instead of real numbers, then use a non-comparison sort like bucket/radix sort?

Tom Robinson on December 11, 2007 12:04 AM

this is my way of getting a random number (please don't criticize the memory access part, simply ACCESSING memory is safe)

When the app starts, itll do a simple srand (i use C/C++) with the parameter of the the time in milliseconds divided by how long the app has been running. it then mallocs an array (no calloc, thatll make everything 0) the length of the random number. It then finds out how much time has passed since the last time we checked the apps run-time. the difference is modulated by a number produced by a call to rand() untill it is a number small enough to fit in the allocated memory. Then access the that entity of the array and you will get a random number from bytes that arent even part of your programs. If its 0, the program retries with a dif rand number until a successful number is obtained

seems long and complicated, but doesnt use any long or system specific procedures and is quite fast since all are functions are part of the language and simple math/memory operations

JavaGuy147 on May 12, 2008 3:39 PM

They could collect a seed like truecrypt does with mouse moves and/or some mandelbrot point.

J on June 22, 2008 9:05 PM

I'd just do the calculation for each card as required.

Less tampering possible I'd think

Pi on August 24, 2008 8:49 PM

This seems to do the trick very well for me, ported from an old ass javascript shuffle prototype I made a loooooong time ago.

One thing I didn't understand with it though, is if I used
"r.Next(0, m_cards.Count - 1)" which would have been the true final index, it stuck itself in an infinite loop, oh wellz. works now =)

public void Shuffle()
{
Random r = new Random();

List<int> used = new List<int>();
List<Card> temp = new List<Card>();
do
{
int index = r.Next(0, m_cards.Count);

while (used.IndexOf(index) > -1)
index = r.Next(0, m_cards.Count);

used.Add(index);
temp.Add(m_cards[index]);
}
while (temp.Count < m_cards.Count);

m_cards.Clear(); // not necessary
m_cards = temp;
}

RRyyaann on September 3, 2008 9:30 AM

why did my msg get removed x.x

RRyyaann on September 3, 2008 9:50 AM

okay and now it's not. what the heck.

RRyyaann on September 3, 2008 9:51 AM

now what I was gunna say before my browser went screwy.
how could you possibly 'predict' my post example?

you run that code once, twice, three times, they're all different. That means somewhere in the world. Somebody trying to sync their generator to the seed of the random() their code is going to produce a different result.

Even being that it's based on a random number between 0 and 52, sure there are limited numbers of output. But being that it makes sure it never reuses the same index before placing the card back in the deck makes it very hard to predict the actual output... by any 'algorithm'.
Now I'm definitely not the greatest programmer in the world. But it seems so farfetched that a person (or code) could predict:
a) the randomly generated number (more possible)
b) the next number to be placed in 'used' (less possible)

Of course, we all program to learn (or make money) so please, if I am wrong, I wanna know why =)

RRyyaann on September 3, 2008 10:04 AM

make an array of 52
assign a random number between 1 a 100000 to each entry
assign the first card to the highest number, the second card to the second highest number, etc.

Rob on September 19, 2008 4:19 PM

I am just a poker player who is researching RNG integrity as it relates to my online poker play.I have no programming experience nor a high education.I have been playing on pokerstars for 3 yrs and have been suspect of the randomness for a long time.Most arguments in this area are about other players being able to predict the rng and act accordingly to win.Thus, all this hullaballoo on how to create a truly random shuffle to nullify these hackers attempts to steal me hard earned dollars.All of this seems to me to be smoke in eyes of the masses.Corruption has always been seeded at the top of the food chain and not us lowly joe's tryin to make a buck to feed our families and pay our bills.I am far more concerned about corruption at the highest levels within my arena, ie, Pokerstars, Intel, and the 2 companies that certified their RNG.Even if pokerstars itself has no part in this conspiracy theory, although they clearly stand to benefit the most from the ability to control cards dealt to players and when, the 2 companies they hired to verify integrity of their Intel RNG were given all their source code and means of seed generation, meaning those employees who performed such tests could take that information and develop code to create an ATM for themselves via pokerstars and/or its players.As for Intel, if they are out to screw us then we are all lost.I could go on and on about my conspiracy theories as to Pokerstars and how they do it, but I really only wanted to make two points about all that I have read here as it relates to online poker, the shuffle, and collusion:
1. Being that the cards are dealt one at a time to the players in a circular motion, if the cards are "too random" in a shuffled deck about to be dealt, wouldn't that be creating a greater likelihood of players getting dealt pairs and suited connectors, and that those same cards would fall on the ensuing flop, turn, and river? ie trips, flushes, 2 pairs, and full boats.Like if all four aces were stuck together at the top quarter of the deck then 4 out of 9 or 10 players would receive one ace and a random card, not pocket aces.I think that these decks are over-shuffled and that is creating these common instances of pocket rockets, trips, full houses, and quads even.Ive seen quads way to many times.This also creates a much more exciting game for the players.We all want to hit those big hands, not just win playing a solid hand.It would seem like a win win scenario for all involved to taint the rng just enough to make play more exciting at their website verses another... right? I ask questions I already know the answer to, I just wonder if anyone agrees with me.

2. If Pokerstars, or any other online gaming site, its employees and programmers, were out to control the game and the outcomes for profit, above and beyond that which they already make with the odds in their favor, then who would know it? Who could prove it? Who could stop it?
Who would even care except for the idiots like myself who keep playing the machine believing its fair and random and that I just need to improve my play ?

Lastly, I just want to say that overall I have made more money than I have lost playing online poker and should be the last person to raise these questions or even care, but I really feel as though something is wrong with the RNG that I have been intimate with for many years and if so, then that is wrong,whether or not it has negatively affected me.I feel that these RNG's should be built by and subject to the scrutiny of people like yourselves, much as how Linux flavors have been developed and scrutinized by the knowing public and not a private multi-million dollar company who's employees cant be trusted, nor the CEO, who would gladly take a % or a payoff to say the RNG is good.

Rich on November 29, 2008 2:20 PM

when writing a card dealer are there any requirements other than:

1. not predictable
2. during a hand don't deal the same card twice ;)
3. the cards should be dealt randomly

dbasnett on February 14, 2009 3:24 PM

perfect shuffle in perl :

sub shuffle {
my @a = ();
my @b = ();

return @_ if (@_ < 2);

for (@_) {
if (random_bit()) {
push @a, $_;
} else {
push @b, $_;
}
}

return (shuffle(@a), shuffle(@b));
}

This is similar to the radix sort algorithm with random numbers.

random_bit returns a boolean. This bit can be read from /dev/random for true randomness on linux.

GuB on March 11, 2009 9:30 AM

Very cool app. Can you tell me where to download the Help files? Apparently they didn’t make it during the install process, so it won’t load them. Thanks!

sohbet on July 6, 2009 4:46 AM

Thank you very muCh

sohbet on July 19, 2009 8:54 AM

Thanks for addressing this issue so quickly, Dr. Lessig. I thought something was fishy in that article.

chat siteleri on July 27, 2009 7:27 AM

So the OrderBy criteria:
- var shuffledcards = cards.OrderBy(a => Guid.NewGuid());
is basically equivalent to
- Random rand = new Random()
- var shuffledcards = cards.Sort((a,b) => rand.Next(0,2) == 0 ? -1 : 1);
since a new guid is created on every comparison. So it's basically sorting with every Compare returning a random result - which isn't the same as the sort Eric mentioned - it will likely be biased based on the sorting algorithm used.

Kumar on July 30, 2009 12:13 PM

thank you was a useful article...

bedava sohbet on August 5, 2009 1:42 AM

Thanks a load for the stats. Haven’t seen anything nearly as comprehensive or useful anywhere else.

sohbet on August 16, 2009 8:47 AM

I can’t wait for the day when web design programs are as easy as layout programs for print!

chat on August 16, 2009 12:16 PM

how can i divide the deck into three parts and then shuffle them?

mn480591 on August 20, 2009 12:02 AM
Content (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved.