Ted Hill, one of the contributors to The Princeton Companion to Applied Mathematics, as well as the coauthor, with Arno Berger, of An Introduction to Benford’s Law, has written a post on this fascinating statistical phenomenon. You’ll be surprised at the rather unexpected places it pops up, from an analysis of Donald Trump’s finances, to earthquake detection.
The acclaimed business and technology news website Business Insider proudly offers this advice to its readers, in capital letters:
The curious statistical phenomenon known as Benford’s Law, first discovered by Newcomb in 1881 and later rediscovered and popularized by Benford in 1938, is currently experiencing an explosion of research activity, especially in fraud detection ranging from tax data and digital images to clinical trial statistics, and from voting returns to macroeconomic data. Complementing these new forensic Benford tools, recent applications also include earthquake detection, analysis of Big Data and of errors in scientific computations, and diagnostic tests for mathematical models. As is common in developing fields, the quality of this research is all over the map, from scholarly and insightful to amusing and outlandish. The most recent Benford article I have seen is an analysis of Donald Trump’s finances, and I will let interested readers have fun judging these Benford articles for themselves. Most may be found on the open access and fully searchable Benford Online Bibliography, which currently references more than 800 articles on Benford’s Law, as well as other resources (books, websites, lectures, etc.).
The First-digit Law
In its most common formulation, the special case of the first significant (i.e., first non-zero) decimal digit, Benford’s Law says that the leading decimal digit is not equally likely to be any one of the nine possible digits 1, 2, …, 9, but rather follows the logarithmic distribution
where D1 denotes the first significant decimal digit. Many numerical datasets follow this distribution, from mathematical tables like the Fibonacci numbers and powers of 2 to real-life data like the numbers appearing in newspapers, in tax returns, in eBay auctions, and in the meta-dataset of all numbers on the World Wide Web (see Figure 1).
For datasets like these that are close to being Benford, about 30% of the leading (nonzero) decimal digits are 1, about 18% are 2, and the other leading digit proportions decrease exponentially to about 5% that begin with 9.
Figure 1. Empirical Evidence of Benford’s Law
The complete form of Benford’s Law also specifies the probabilities of occurrence of the second and higher significant digits, and more generally, the joint distribution of all the significant digits. For instance, the probability that a number has the same first three significant digits as π = 3.141… is
(For non-decimal bases b, the analogous law simply replaces decimal logarithms with logarithms base b.)
Robustness of Benford’s Law
Benford’s Law is remarkably robust, which may help explain its ubiquity in both theory and applications. For example, it is the only distribution on significant digits that is scale invariant (e.g., converting from dollars to euros or feet to meters preserves Benford’s Law), and is the only continuous distribution on significant digits that is base-invariant.
As an example of stochastic robustness, if a random variable X satisfies Benford’s Law, then so does XY for all positive Y independent of X; thus in multiplying independent positive random variables, say to model stock prices, if you ever encounter a single Benford’s Law entry, the whole product will obey Benford’s Law. Moreover, if X follows Benford’s Law, then so do 1/X and X2, (and all other non-zero integral powers of X).
Benford’s Law is also robust under both additive and multiplicative errors: If an increasing unbounded sequence of values X obeys Benford’s Law, then so does X + E for every bounded “error” sequence E, and if X is Benford and E is any independent error with |E| < 1, then (1 + E)X is also exactly Benford.
Applications of Benford’s Law
The most widespread application of Benford’s Law currently is its use in detection of fraud. The idea here is simple: if true data of a certain type is known to be close to Benford’s Law, then chi-squared goodness-of-fit tests can be used as a simple “red flag” test for data fabrication or falsification. Whether the tested data are close to Benford’s Law or are not close proves nothing, but a poor fit raises the level of suspicion, at which time independent (non-Benford) tests or monitoring may be applied.
A similar application is being employed to detect changes in natural processes. If the significant digits are close to Benford’s Law when the process is in one particular state, but not when the process is in a different state, then comparison to Benford can help identify when changes in the state of the process occur. Recent studies have reported successful Benford’s Law tests to detect earthquakes, phase transitions in quantum many-body problems, different states of anesthesia, signal modulations in electrophysiological recordings, and output changes in interventional radiology.
Tests for goodness-of-fit to Benford are also useful as a diagnostic tool for assessing the appropriateness of mathematical models. If current and past data obey Benford’s Law, it is reasonable to expect that future data will also obey Benford’s Law. For example, the 1990, 2000, and 2010 census statistics of populations of the some three thousand counties in the United States follow Benford’s Law very closely (see Figure 1), so to evaluate a proposed mathematical model’s prediction of future populations, simply enter current values as input, and then check to see how closely the output of that model agrees with Benford’s Law (see Figure 2).
Figure 2. Benford-in-Benford-out Diagnostic Test
The appearance of Benford’s law in real-life scientific computations is now widely accepted, both as an empirical fact (as reported in Knuth’s classic text), and as a mathematical fact (e.g., Newton’s method and related numerical algorithms have recently been shown to follow Benford’s Law). Thus, in those scientific calculations where Benford’s Law is expected to occur, knowledge of the distribution of the output of the algorithm permits better estimates of both round-off and overflow/underflow errors.
Recent Theoretical Developments
Complementing these applications are new theoretical advancements, which are useful in explaining and predicting when Benford analysis is appropriate, and which are also of independent mathematical interest. Recent results include:
- The outputs of many numerical algorithms, including Newton’s method, obey Benford’s Law.
- Iterations of most linear functions follow Benford’s Law exactly, and iterations of most functions close to linear, such as f(x) = 2x + e–x, also follow Benford’s Law exactly.
- Continuous functions with exponential or super-exponential growth or decay typically exhibit Benford’s Law behavior, and thus wide classes of initial value problems obey Benford’s Law exactly.
- Powers and products of very general classes of random variables, including all random variables with densities, approach Benford’s law in the limit (see Figure 3 for the standard uniform case).
- Many multidimensional systems such as powers of large classes of square matrices and Markov chains, obey Benford’s Law.
- Large classes of stochastic processes, including geometric Brownian motion and many Levy processes, obey Benford’s Law.
- If random samples from different randomly selected probability distributions are combined, the resulting meta-sample also typically converges to Benford’s Law. (This may help explain why numbers in the WWW and newspapers and combined financial data have been found to follow Benford’s Law.)
Figure 3. Powers of a Uniform Random Variable
The study of Benford’s Law has also at times been entertaining. I’ve been contacted about its use to support various religious philosophies (including evidence of Benford’s Law in the Bible and Quran, and its appearance in tables of the earth’s elements as evidence of Intelligent Design), as well as a website where Eastern European entrepreneurs sold Benford data to people who need it for 25 euros a pop. For me, however, the main attraction has been its wealth of fascinating and challenging mathematical questions.
Ted Hill is Professor Emeritus of Mathematics at the Georgia Institute of Technology, and currently Research Scholar in Residence at the California Polytechnic State University in San Luis Obispo. He is the co-author, with Arno Berger, of An Introduction to Benford’s Law, (Princeton University Press, 2015).