Saturday, January 06, 2007

More Random than Random

And here you are with the final sample article I wrote up:

Quick quiz: What do streak-shooting basketball players, Princeton’s Global Consciousness Project, and iTunes’ Smart Shuffle feature have in common. If you said, “They’re all the result of people misinterpreting random data,” then you’re reading this out loud and probably getting strange looks right about now (You’re also right).

The human mind doesn’t do well with the random, but, as with most flaws in human perception, there’s actually a good reason for this. In harsher times, when survival was less of a given than it is today, there was a distinct evolutionary advantage to finding patterns. A human who noticed many patterns was more likely to spot the pattern indicative of a predator, or realize that a seed dropped into the ground last year had turned into a sprout this year. A few extra patterns spotted may incur slight inconveniences, but it’s still a lot better than dying because you missed one.

The flip side to this is that now humans aren’t good judges of random data, and start to see patterns where there are none. For instance, take the following three strings of numbers, representing a random choice from 4 numbers:

12414412
23123441
21434231

One of them is completely random, the other two aren’t. Which one do you think is the random one? Human instincts lead most people towards the second or third choice, but it’s actually the first one—which deviates the most from the expected distribution and looks streakiest—which I generated randomly. It has too many 4’s and 1’s and absolutely no 3’s, but that was the way the dice rolled. In the long run, random data will indeed average out, but streaks are to be expected in the short term.

So let’s go back to the examples I listed at the beginning of this article. It’s a commonly-held belief that in basketball, players are prone to streak shooting, often referred to as the “hot hand.” After a few successful shots, a player is believed to “loosen up” or “get into a groove.” The reverse supposedly happen after a few misses. To determine whether or not this actually happens, Gilovich et al. examined the records of the Philadelphia 76ers during the ’80-’81 season. The results were surprising, to say the least: It turned out that after a series of baskets, players were less likely to score again. On the other hand, after a series of misses, players were more likely to land a basket. What was really going on is that people were noticing the streaks that inevitably arise in random data and believing them to be non-random.

A similar situation arose with the shuffle feature on iPods at one point. Apple received numerous complaints and inquiries from people who claimed that their iPods were playing favorites. Some artists or albums seemed to show up much too often, while others too rarely—if at all. Their software engineers went over the random number generator again and again, and they found nothing wrong with it. And yet people wouldn’t believe them. Eventually, they decided they had to address it somehow, so they came out with the “smart shuffle” feature. It allowed people to decrease the likelihood that multiple songs by the same artist would be heard within a small amount of time. As Steve Jobs, Apple co-founder described it, “We're making it less random to make it feel more random.”

My final example is the worst of the bunch: Princeton’s Global Consciousness Project. This project was inspired by work done in PEAR (The Princeton Engineering Anomalies Research), and it essentially boils down to people watching random numbers and trying to judge if they see deviations during major world events. And, surprising as it may not be, they’re able to find deviations from a perfectly chance distribution around every major event (and plenty of deviations around absolutely nothing, but they don’t talk about them). They claim that these deviations are due to some “quantum energy field” in the world caused by human consciousness. In the eight years that the project has been active, all it's accomplished is spotting streaks in random data while wasting mass amounts of time and money, all of which could have been saved with just a little bit better understanding of what 'random' means, but no one's stepped in to put a stop to it, so far.

6 comments:

Bob said...

Come to think of it, I think the PEAR Institute is closing its doors. After a couple of Google searches, and this wiki entry, it's shutting down sometime this Spring, but I'm still reading up on the details.

Infophile said...

They indeed are. I mentioned this in a post a while back, but I didn't go much into it beyond saying that it was happening. Either way, this is quite good news. Unfortunately, the GCP isn't officially part of PEAR anymore, so I can't say for sure if it's being shut down as well.

Anonymous said...

I had a former coworker of mine tell me that his wife was wanting to try for a third child, because she wanted a girl (they had two boys). So, I asked him if that she understood that there was still a fifty/fifty chance that the next one would be a boy. He laughed, and said he'd bring that up. I don't know if he did or not, but they haven't had another kid.

Although, I have to question whether certain genders can sometimes run in families. My daughter is the first female born in my branch (father, uncle, four male cousins), in three generations. Since the male decides (ahem, tongue in cheek) the gender of the child, I wonder if some may produce more sperm with an X or Y chromosome.

Anyway, I also had a problem with Excel a few years ago. We developed a spreadsheet that would choose the place we went to lunch, using Excel's random number generator (yes we had too much time on our hands). However, each time, we opened it, it started out with the same one, and for the next three, at least, it would go in the same order each time. Eventually, we just kept index cards with the names, and randomly picked them out of a bowl.

-Berlzebub

Infophile said...

Although, I have to question whether certain genders can sometimes run in families. My daughter is the first female born in my branch (father, uncle, four male cousins), in three generations. Since the male decides (ahem, tongue in cheek) the gender of the child, I wonder if some may produce more sperm with an X or Y chromosome.

Biologically, that shouldn't happen. Sperm are created through meiosis, which creates four cells from one parent, with each of these new cells having half the genes of the parent cell. Since there's one copy and one split involved, there will always be two daughter cells with an X chromosome and two with a Y chromosome.

Now, "female" sperm (with an X chromosome) are slightly larger than "male" sperm, so this might cause a slight bias somehow. On a population scale, I've heard that it results in a very slight bias towards daughters, but I don't know if there's anything that could cause it to vary in the bias between different men.

More likely, I suspect that it's simply confirmation bias in noticing the odd cases. In your case, you have a string of 6 males in a row. The chance of this happening is 1/2^6 or 1/64. If you generalize it to 6 of either gender, which would seem just as odd, the chance is 1/32. With all the families out there, there are still going to be a ton who show streaks like this.

Anyway, I also had a problem with Excel a few years ago. We developed a spreadsheet that would choose the place we went to lunch, using Excel's random number generator (yes we had too much time on our hands). However, each time, we opened it, it started out with the same one, and for the next three, at least, it would go in the same order each time. Eventually, we just kept index cards with the names, and randomly picked them out of a bowl.

Sounds like a problem with seeding there. Most computerized random number generators are essentially a huge list of pseudo-random numbers. This means that if it starts at the same place in the list (the "seed"), it will give you the same result every time. Most programmers get around this by setting the seed to be based on the current time whenever the program starts up.

TheBrummell said...

Sex ratios are theoretically heritable, such that particular families would have higher than 50% proportions of one sex, through many generations. There's a whole field of evolutionary biology and population genetics devoted to studying this, often with lots and lots of math.

I've never heard of a confirmed case in Homo sapiens, but I have heard rumours of lab strains of mice or other model organisms with skewed sex ratios. Often the mechanism turns out to be either highly frequent non-disjunction of sex chromosomes during meiosis (X and Y stick together, for example) or a mutation in the sex-determining locus, leading to (for example) XY females. Sterility shows up in many of these cases too, which raises questions about what a sex ratio really is when many individuals of one or the other sex are sterile. The 1/64 answer seems much, much more likely - detecting skewed sex ratios in real populations requires enormous sample sizes, which is why these studies tend to use things like Drosophila.

A couple of years ago, a colleague was working on sex determination in guppies (Poecilia reticulata), because some of her breeding females were producing only daughters (if I remember correctly). Guppies have an XY sex determination system similar to humans. I think the most likely answer turned out to be death-in-embryo of males, due to some weird genetic interaction resulting from crosses between long-separated populations.

My "shuffle" feature in MS Windows Media Player also produces intuitively non-random results (I tend to hear many Hendrix songs in a row), but I'm not convinced it's not actually pretty good randomness. I would like a feature that would prevent a song being played twice in a string of n songs, though.

Anonymous said...

infophile and thebrummell:
Thanks for the response. I thought it might be possible, but I didn't expect it was probable. However, that doesn't change the fact that the first girl in three generations is being spoiled beyond belief by mamaw and papaw. ;)

infophile said:
Most programmers get around this by setting the seed to be based on the current time whenever the program starts up.

This was an older version of Excel, early nineties, so it may have still had some problems with randomness. Then, again, we started up at almost exactly the same time, every day. So, unless they included everything about the current time (second, minute, hour, date, month, year) it would be foreseeable that it would start with the same one. It was/is a Microsoft program, after all.

-Berlzebub