Shuffle - mike's web log/comments

New comments for this entry are disabled.

December 01, 2007 | Shuffle | 1379 hit(s)

One of the great things about not having studied computer science formally is that you get to discover interesting problems all by yourself, and then cobble together your own solutions. You can then, if you want, do some research in order to find out, as generally happens to me, that Donald Knuth identified, analyzed, and solved the problem several decades ago. Better than you did, needless to say.

So, shuffling. I was doing a thing with playing cards in software the other day, and got sidetracked (more like stopped short) by the interesting problem of how to "shuffle" the "deck." Obviously it involved random selections and such, but the actual mechanics were not immediately obvious to me.

Here's where the non-CS mind comes into play. My first thought was to generate an unshuffled deck as an array-like structure -- all cards in order by suit. Then I'd create a second array-like structure. I'd walk through each card in the unshuffled deck, pick a random number, and insert the card at the randomly selected spot in the second array. If the randomly chosen position in the second array was already occupied, I'd choose another random number, see if it was used, and so on until the random selection happened to land on a free spot. I'll call this the Random Insert approach.

But this seemed slightly fishy to me. For the first some number of cards, the random selection would not collide (much) with spots already filled. But as the second array got more and more filled, it would take the random-selection loop longer and longer to stumble across open spots in the array. Good heavens, how many random selections would it take to fill that last slot? There is, I'm sure, a mathematical way to express this ever-fewer-options issue. Not knowing what that is, I indulged a notion that this could get into squares and cubes of time and, who knows, bring the computer to a halt.

Before I even tried this, I started thinking of alternatives. One thought I had was to find random buckets for, say, the first 42 cards in the unshuffled deck. When I got to the last 10 cards, I'd stop randomly selecting slots in the second array. Instead, I'd just walk up that array and stuff the last 10 cards into whatever 10 spots happened to not be filled yet. That seemed sort of quasi-random to me, without the problem of the randomizer spinning and spinning quite so much to find a free spot toward the end of the process. This would be the Blended Random Insert approach.

I also thought that another solution would be to create the second array as a huge array, maybe 5 times as large as the deck. In that case, the randomizer wouldn't run into nearly as many collisions and would find spots for all the cards relatively easily. However, after all the cards had new homes, this would require crunching down this sparse array and discarding the 4 x 52 unfilled slots. I'll call this the Big Array approach.

Or perhaps another approach altogether. Rather than randomly finding new spots in the shuffled array, I could randomly select cards in the unshuffled deck, and then simply add them one by one to the shuffled array. To avoid the problem of potentially selecting a card again, I would simply chop that card out of the array. So the first time, I'd choose among 52 chards. The second time, among 51 items. The third time, among 50 items, and so on, until all the cards had been randomly selected and moved over to the shuffled array. I'll call this the Array Chop approach.

What the heck, I tried three of them. (I didn't try the Blended Random Insert approach.) You can see them here. Source for the shuffle-y part is here, so you can tell me where I'm going wrong.

Update I added Julian's "Array Swap" approach as a fourth experiment -- see comments.

As an experiment, I shuffled each deck five times. To do that, I took the output of one shuffle and fed it back into another shuffle. Might this result in additional randomess? Kinda hard to tell from looking at the results.

To time these, I used a "high-performance" timer that was recommended by Scott Hanselman. I don't care about units, just the comparative values.

Speaking of which. What I find is that the difference between the ~~three~~ four approaches is almost negligible. The first time or two that you run the shuffle, the process can take "a long time" (some small fraction of a second). Subsequent tries go down by a factor of three; clearly something is being initialized the first time or something. If there is any of the approaches that's slower, it's the Random Insert method; more accurately, it's the algorithm[1] whose timing varies the most across multiple tests.

Anyway, just for fun. At some point I need to get back to whatever it was that I needed shuffling for.

[1] Algorithm was the Anu Garg's Word of the Day just this week.

Rik Hemsley 02 Dec 07 - 1:46 AM

I found a well designed algorithm here: http://www.secureprogramming.com/?action=view&feature=recipes&recipeid=23

But for my toy card game, written in Ruby, I ended up using something very simple, but effective:

cards.sort_by { rand }

Julian 02 Dec 07 - 2:19 AM

I use an algorithm that I learnt by studying a long-forgotten Commodore-64 BASIC program in the mid-80s.

create an array of 52 cards.
for X in 1..52
{
R = random number from 1..52
Swap card X with card R.
}

Advantages:
* Simple.
* Takes a fixed, short amount of time to run.
* Doesn't require two arrays; it's all done in place.

mike 02 Dec 07 - 8:34 AM

Julian, I added the Array Swap algorithm. Seems about in the same range, time wise, tho it is, as you note, simpler and presumably takes less memory.

One problem, of course, is that it's difficult to assign a degree of randomness to any one of the approaches. For me, anyway ...

Eric Lippert 03 Dec 07 - 11:26 AM

Your original algorithm -- for each source card pick a destination slot at random, try again if it is already filled -- is O(n^2) on average for a deck with n cards. (Proving this is straightforward, you just have to solve a very simple probabalistic recurrance, email me if you want the details.) But yes, it is gross and the running time is probabalistic -- it could be arbitrarily long if you have bad luck.

Your next attempt -- for each destination slot, pick a random source slot, compact the deck every time -- is also O(n^2), but at least is bounded. There cannot be more than (n/2)(n+1) array moves during compaction.

Your third attempt -- for each slot, swap randomly with another member -- is efficient at O(n) but has the unfortunate property that some final configurations are more common than others, ie, this algorithm has bias. (Proof: do it with a deck of three cards. There are six possible shufflings. The algorithm does three swaps on three cards so there are 3^3 = 27 possible sequences produced. 6 does not go evenly into 27, so some of the 6 possbilities must be overrepresented in the 27 possible outcomes.)

The standard way of implementing this algorithm is: associate each card with a random real number between 0.0 and 1.0. Sort the list based on its associated number. That's O(n log n) and has no bias.

Note that the randomness source should be crypto strength. Otherwise it is possible to deduce the state of the entire deck from a portion of it. See my commentary on Greg's article here:

http://blogs.msdn.com/gstemp/archive/2004/02/23/78434.aspx

(Just substitute "cards in a hand" for "characters in a password".)

Julian 08 Dec 07 - 11:46 PM

Jeff Atwood took up this challenge in two posts.

http://www.codinghorror.com/blog/archives/001008.html

http://www.codinghorror.com/blog/archives/001015.html

I was forced to admit that he was right and I was wrong.

http://www.somethinkodd.com/oddthinking/2007/12/09/shuffling-and-ownage/

Sorry.

mike 09 Dec 07 - 12:44 AM

Yeah, I followed up:

http://www.mikepope.com/blog/DisplayBlog.aspx?permalink=1858