Benford’s Law is a fascinating mathematical trivium. Take many kinds of seemingly random data, and look at the first digit of each datapoint. Intuitively, you’d probably think that there would be about the same amount of 2’s as 3’s and 7’s and 8’s and so on. But this isn’t true. There are far more 1’s than anything else. And there is a distinct distribution the rest of the digits fall into. Read more about it at Wikipedia. “This counter-intuitive result applies to a wide variety of figures, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants.”
This isn’t for all kinds of data. Lottery numbers are truly random. Height in inches will mostly start with 5s and 6s. On the other hand, this is one way to identify tax cheats. Just look at all the numbers on their return, and they should follow this distribution. Cheaters who make up numbers make up random sounding numbers that aren’t really random the right way.
I decided to check it out. I looked at some traffic data for a “random” website. I looked at how many people visited it by area of the country, and how many of them opted-in to a database. The results follow:
Leading Digit Benfords Law prediction DMA Visitors DMA Optins
1 30.1% 30.0% 32.1%
2 17.6% 16.4% 17.9%
3 12.5% 12.7% 10.8%
4 9.7% 8.9% 8.5%
5 7.9% 6.6% 8.0%
6 6.7% 7.0% 8.5%
7 5.8% 6.1% 5.2%
8 5.1% 7.5% 4.7%
9 4.6% 4.7% 4.2%
Remarkable.
Update: “What’ a DMA?” It’s the way the Neilsens break up the country for ratings and is the de facto standard for analyzing geographic performance for any company with a strong advertising component. It stands for Direct Marketing Area. There are 210 of them, so it’s a good amount of data for a test like this.