Benford’s Law is a fascinating mathematical trivium. Take many kinds of seemingly random data, and look at the first digit of each datapoint. Intuitively, you’d probably think that there would be about the same amount of 2’s as 3’s and 7’s and 8’s and so on. But this isn’t true. There are far more 1’s than anything else. And there is a distinct distribution the rest of the digits fall into. Read more about it at Wikipedia. “This counter-intuitive result applies to a wide variety of figures, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants.”
This isn’t for all kinds of data. Lottery numbers are truly random. Height in inches will mostly start with 5s and 6s. On the other hand, this is one way to identify tax cheats. Just look at all the numbers on their return, and they should follow this distribution. Cheaters who make up numbers make up random sounding numbers that aren’t really random the right way.
I decided to check it out. I looked at some traffic data for a “random” website. I looked at how many people visited it by area of the country, and how many of them opted-in to a database. The results follow:
Leading Digit Benfords Law prediction DMA Visitors DMA Optins
1 30.1% 30.0% 32.1%
2 17.6% 16.4% 17.9%
3 12.5% 12.7% 10.8%
4 9.7% 8.9% 8.5%
5 7.9% 6.6% 8.0%
6 6.7% 7.0% 8.5%
7 5.8% 6.1% 5.2%
8 5.1% 7.5% 4.7%
9 4.6% 4.7% 4.2%
Remarkable.
Update: “What’ a DMA?” It’s the way the Neilsens break up the country for ratings and is the de facto standard for analyzing geographic performance for any company with a strong advertising component. It stands for Direct Marketing Area. There are 210 of them, so it’s a good amount of data for a test like this.
Well, sadly, I don’t believe an average Joe like me can login to your site…there isn’t really any place to do so, at least that I can find on the main http://www.muttrox.com page, nor anywhere else I can find. Not that it’s strictly necessary, as you state.
Oh, and I guess you don’t quite understand, I work for a company that has created their own GIS software. I’m not just someone who analyzes with the software. In fact, I build most of the demographic and cartographic data that they sell. In effect, I have access to ANY type of cartography you might want (along with the associated demographic data). 🙂 Not bragging, but just thought I’d elucidate a bit. Anyone need 1980 Census Native American household figures at the block group level? And a classified map to go along with it? lol Ain’t exciting, necessarily, but it’s a living.
Matt,
My company works by DMA mostly, that’s how I had the numbers handy. How funny that we’re both using it.
Edits – I can edit of course, but I can’t pretend to not be me, so I can’t see what you see. I think you *can* login (I have to), you just don’t need to in order to post.
P.P.S – It’s kind of annoying that I can’t figure out how to put in a carriage return in a blog comment.
Also a bit annoying that I cannot edit my comment, but I guess without a login feature, that would be impossible. Hrmm…
It really doesn’t matter what geography you are grouping the website hits by…like you stated, this would work for simple population numbers by county, for all that it matters. It is important to note, however, that yes, this doesn’t work for every type of data, but that it is extremely useful for certain applications of fraud detection. I do wonder why they haven’t (to my knowledge) used it for voter fraud detection as of yet, since it would seem to be a perfect application of its use.
P.S. – I ran this against population within DMA’s (I actually work with Geographic Information Systems, so this sort of stuff actually falls somewhat under my area of expertise, lol), and it freakishly works out! This is some weird, wacky stuff (Carvey doing Carson).
Benford’s Law by the physicist Frank Benford — Not to be confused with Greg Benford, the author, of whom geeks like myself will immediately think.
Please explain what DMA is, so this entry makes more sense. thanks