Benfordʼs Law Geometry
Benfordʼs Law Geometry
Abstract and Keywords
This chapter switches from the traditional analysis of Benford's law using data sets to a search for probability distributions that obey Benford's law. It begins by briefly discussing the origins of Benford's law through the independent efforts of Simon Newcomb (1835–1909) and Frank Benford, Jr. (1883–1948), both of whom made their discoveries through empirical data. Although Benford's law applies to a wide variety of data sets, none of the popular parametric distributions, such as the exponential and normal distributions, agree exactly with Benford's law. The chapter thus highlights the failures of several of these well-known probability distributions in conforming to Benford's law, considers what types of probability distributions might produce data that obey Benford's law, and looks at some of the geometry associated with these probability distributions.
Keywords: Benford's law geometry, Simon Newcomb, Frank Benford, parametric distributions, probability distributions, geometry
The original discovery of Benford’s Law by Simon Newcomb was based on the uneven wear in the pages of logarithm tables. The subsequent independent discovery by Frank Benford was based on conformance of the first digit from a diverse set of data to Benford’s Law; both of these discoveries were driven by empirical data. Although Benford’s Law applies to a wide variety of data sets, none of the popular parametric distributions, such as the exponential and normal distributions, agree exactly with Benford’s Law. After highlighting the failures of several well-known probability distributions in conforming to Benford’s Law, we consider what types of probability distributions might produce data that obey Benford’s Law, and look at some of the geometry associated with these probability distributions.
4.1 Introduction
Simon Newcomb (1835–1909)was a largely self-taughtAmerican who immigrated from Canada with professional interests in astronomy and mathematics. During his lifetime, calculations were typically performed using logarithm tables. Newcomb noticed that the pages of tables of logarithms had more wear at the beginning of the tables than at the end. The argument to a logarithm table ranges from 1.0 to 10.0 and is arranged in a linear fashion (for example, the arguments between 1.0 and 2.0 take exactly 1/9 of the pages), yet the tables showed more wear on the earlier pages. Newcomb postulated that those using the logarithm tables tended to look up the largest fraction of values beginning with the digit 1 and the smallest fraction of values beginning with the digit 9. In what can be considered an astoundingly insightful conclusion, particularly considering that his data set could be viewed only by worn pages, he postulated that the distribution of the leading digit X of numbers accessed in the logarithm tables followed a discrete probability distribution with probability mass function
Newcomb published what was known as the “logarithmlaw” in the American Journal of Mathematics in 1881. Considering just the extreme values, this law indicates that over 30% of the arguments to a logarithm table will have a leading digit of 1 (p.110) because P(X = 1) = log_{10}(2) ≅ 0.301, and less than 5% of the arguments to a logarithm table will have a leading digit of 9 because P(X = 1) = log_{10}(10/9) ≅ 0.0458.
Frank Benford, Jr. (1883–1948) was an electrical engineer and physicist who spent his career working for General Electric. He apparently independently arrived at the same conclusion as Newcomb concerning the distribution of the leading digit. His rediscovery of what has been named “Benford’s Law” came from his collection of “data from as many fields as possible” to determine whether natural and sociological data sets would obey the logarithm law ([Ben]). In 1938 Benford analyzed the leading digits of 20, 229 data values that he had gathered from a divergent set of sources (for example, populations of counties, American League baseball statistics, numbers appearing in Reader’s Digest, areas of rivers, physical constants, death rates, drainage rates of rivers, atomic weights). The proportions associated with each of the leading digits are given in Table 4.1, which are a very close fit to Benford’s Law.
Table 4.1 Benford’s leading digit frequencies.
Digits |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
---|---|---|---|---|---|---|---|---|---|
Benford’s Law |
0.301 |
0.176 |
0.125 |
0.097 |
0.079 |
0.067 |
0.058 |
0.051 |
0.046 |
Data |
0.306 |
0.185 |
0.124 |
0.094 |
0.080 |
0.064 |
0.051 |
0.049 |
0.047 |
In hindsight, we know part of the explanation of why Benford’s data came so close to the proposed distribution of leading digits. First of all, the data set contained observations that spanned several orders of magnitude. This is not a requirement for conformity to Benford’s Law, but it seems to help. As shown later in this chapter, a probability distribution can satisfy Benford’s Law and span only a single order of magnitude. Second, by choosing such a wide array of data values, Benford was effectively mixing several probability distributions together, and it has been seen that this also enhances conformance to Benford’s Law.
Since certain data sets seem to approximate Benford’s Law with regularity, a reasonable next step is to search for probability distributions that give rise to data that conforms to Benford’s Law. Hill ([Hi4]) framed the question well: “An interesting open problem is to determine which common distributions (or mixtures thereof) satisfy Benford’s Law … .” This chapter switches from the traditional analysis of Benford’s Law using data sets to a search for probability distributions that obey Benford’s Law.
The analogous search occurred in the early days of probability theory when analysts found so many measurements that produced data that was bell shaped that they named the associated probability distribution the “normal” distribution. (Perhaps any non-bell-shaped distribution was considered to be “abnormal” at the time.) Most of what is known as classical statistics emerged from the derivation of the probability density function of the normal, or Gaussian, distribution.
In order to limit the focus of this chapter, the following assumptions will be made.
• The focus of the analysis is on probability distributions rather than data.
(p.111) • A probability distribution that might obey Benford’s Law is associated with a continuous random variable.
• The probability distribution has support on the positive real numbers or some subset thereof.
• Base 10 is used to represent random variables associated with probability distributions.
• Only the leading digit is of interest. All digits to the right of the leading digit are ignored.
Other chapters in this book (especially Chapter 2) concern mathematical results associated with relaxation of these assumptions.
The next section considers some popular parametric distributions and assesses their conformity to Benford’s Law. The following section considers probability distributions that obey Benford’s Law exactly and the geometric and algebraic properties that they possess. The last section contains conclusions.
4.2 Common Probability Distributions
As given in the previous section, let X be a discrete random variable whose support is the integers 1, 2, … , 9 with probability mass function
In this chapter, the term “Benford distribution” is used to describe this probability distribution. The random variable X has the following associated cumulative distribution function on its support values:
Using the probability integral transformation, random variates having the Benford distribution are generated via
where U is a uniform random variable on [0, 1], denoted by U ∼ U(0, 1).
We now define the probability distribution from which the data will be drawn. Let the continuous random variable T have positive support and cumulative distribution function F_{T} (t) = P(T ≤ t). We are interested in the leading digit of a realization of T , which we obtain through the significand function. (Recall we may write any positive number x uniquely as S(x) · 10^{k(x)}, where S(x) ∈ [1, 10) and k(x) is an integer; S is called the significand function.) For example, S (e) = S(10e) = S(e/100) = 2.71828 …. Using the significand function, the leading digit of T can be expressed as
(p.112) The next calculation that is necessary is to determine the probabilities associated with the nine potential leading digits of T having an arbitrary probability distribution. The probability mass function of Y is
for y = 1, 2, … , 9.
Since Benford’s Lawseems to apply to a variety of data sets, one would assume that several of the popular parametric models, such as the exponential or Weibull distributions, would provide a close fit to Benford’s Law. For certain choices of the parameters of some of these distributions this is in fact the case. The probability mass function of Y was calculated for the U (1, 10), unit exponential, and unit Rayleigh distributions, which respectively have cumulative distribution functions
and
The following R code, with a carefully selected lower bound lo and a carefully selected upper bound hi in order to ensure that nearly all of the probability density is captured, calculates the probability mass function of Y for the unit Rayleigh distribution.
- cdf = function(x) 1 - exp(-x ^ 2)
- digits = rep(0, 9)
- for (y in 1:9) {
- for (i in lo:hi) {
- digits[y] = digits[y] + cdf((y + 1) * 10 ^ i) - cdf(y * 10 ^ i)
- }
- }
- print(digits)
The results of these calculations for all three probability distributions are shown in Table 4.2. The U (1, 10) distribution provides the worst fit of the three probability distributions because each leading digit is equally likely to occur. The unit exponential probability mass function is monotone like the Benford probability mass function, but it gives too many 1s, 8s, and 9s relative to the Benford distribution. Finally, the unit Rayleigh distribution deviates even further from Benford’s Law. Although the unit exponential distribution is the best of the three in terms of proximity to Benford’s distribution, none of these perform even as well as Benford’s original data set.
Having failed to find a distribution that closely approximates Benford’s Law, the search widens for probability distributions that provide a closer approximation. A (p.113)
Table 4.2 Leading digit frequencies for common probability distributions.
Leading digit |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
---|---|---|---|---|---|---|---|---|---|
Benford’s Law |
0.301 |
0.176 |
0.125 |
0.097 |
0.079 |
0.067 |
0.058 |
0.051 |
0.046 |
^{U}(1, 10) distr. |
0.111 |
0.111 |
0.111 |
0.111 |
0.111 |
0.111 |
0.111 |
0.111 |
0.111 |
Unit exponential distr. |
0.330 |
0.174 |
0.113 |
0.086 |
0.073 |
0.064 |
0.058 |
0.053 |
0.049 |
Unit Rayleigh distr. |
0.379 |
0.066 |
0.063 |
0.074 |
0.082 |
0.087 |
0.087 |
0.084 |
0.079 |
probability distribution that provides a surprisingly close approximation to Benford’s Law is the log-normal distribution with cumulative distribution function
where μ is a real-valued parameter, and σ is a positive real-valued parameter. We arbitrarily set μ = 0 and gradually increase σ. Since the approximation to Benford’s Law is very close for the log-normal distribution, we use a measure similar to the Kolmogorov–Smirnov goodness-of-fit test statistic to assess the fit:
Table 4.3 gives the value of d for several values of σ.
Table 4.3 Assessing conformance to Benford’s Law for the log-normal distribution.
σ |
1/4 |
1/2 |
1 |
2 |
3 |
||||||
---|---|---|---|---|---|---|---|---|---|---|---|
d |
1.96 |
× 10 |
^{−}1 |
1.17 |
× 10^{−}1 |
7.30 |
× 10^{−}3 |
1.03 |
× 10^{−}7 |
1.03 |
× 10^{−15} |
The log-normal distribution appears empirically to be approaching Benford’s Law as σ increases (see Theorem 3.2.2 and Corollary 5.4.7 for additional theoretical support). What is it about the log-normal distribution that makes this occur? The geometry behind why certain distributions conform well to Benford’s Law is taken up in the next section.
4.3 Probability Distributions Satisfying Benford’s Law
Rather than looking at the well-known probability distributions considered by probabilists for modeling or by statisticians for statistical inference, we try to construct probability distributions that satisfy Benford’s Law exactly. One of the key insights gleaned from the last section is that random variables whose logarithm provides a symmetric distribution have a good chance of satisfying Benford’s Law.
We initially look past the obvious continuous probability distribution that satisfies Benford’s Law by brute force, that is,
f_{T}(t) = log_{10}(1 + 1/t), 1 < t < 10.
(p.114) This distribution spans simply one order of magnitude, dividing the nine leading digits between 1 and 9 into cells with probabilities that match Benford’s distribution exactly.
In order to find a nontrivial distribution with exact conformance to Benford’s Law, we define another random variable: W = log_{10} T. It is easier to construct distributions that conform to Benford’s Law by working with W rather than T. For example, we let W ∼ U (0, 1), which has probability density function
By using the transformation technique (see for example [HogMC]), the distribution of T = 10^{W} has probability density function
This probability distribution is, from one point of view, the primary continuous distribution whose leading digit satisfies Benford’s Law because (a) its base 10 logarithm is U (0, 1), (b) its support spans a single order of magnitude, and (c) its probability density function is a continuous function (unlike the probability density function described in the previous paragraph). Figure 4.1 shows the probability density function of W on the left-hand graph and the probability density function of T on the right-hand graph. The shaded areas on the graphs correspond to the probability that the leading digit is 4 (the digit 4 was an arbitrary choice).
This example can be extended to cover two orders of magnitude simply by letting W ∼ U(0, 2). This distribution also satisfies Benford’s Law exactly. In this case, the probability density function of W is
The distribution of T = 10^{W} has probability density function
(p.115) Figure 4.2 shows the probability density function of W on the left-hand graph and the probability density function of T on the right-hand graph. The shaded areas on the graphs correspond to the probability that the leading digit is 4. Since two orders of magnitude are spanned by the support of T , there are two ranges (4 ≤ T < 5 and 40 ≤ T <50) that result in having Y = 4 as a leading digit.
The previous two distributions that satisfy Benford’s Law exactly can be generalized to cover all uniform distributions for W that cover an integer number of orders of magnitude. Let W ∼ U (a, b). As long as b − a is a positive integer, then the support of T is 10^{a} < T < 10^{b}, which covers b − a orders of magnitude. Benford’s Law is satisfied exactly because the effect of picking off the leading digit shifts all W values into the interval (0, 1) and shifts all corresponding T values into the interval (1, 10). When b − a is an integer, the support of W that falls outside of (0, 1) that is shifted into the unit interval does so in a fashion that results in T following Benford’s Law, as will be seen geometrically in the next paragraph. For example, if W ∼ U (3.507, 6.507), then the support of T spans exactly three orders of magnitude and it obeys Benford’s Law exactly.
There are non-uniform distributions for W that also satisfy Benford’s Law. One simple example is to allow W to have the triangular distribution with minimum 0, mode 1, and maximum 2. In this case, the probability density function ofW is
The distribution of T = 10^{W} has probability density function
Figure 4.3 shows the probability density function of W on the left-hand graph and the probability density function of T on the right-hand graph. The shaded areas on the graphs correspond to the probability that the leading digit is 4. Since two orders of magnitude are again spanned by the support of T, there are two ranges (p.116) (4 ≤ T < 5 and 40 ≤ T < 50) that result in having Y = 4 as a leading digit. The geometry associated with what is happening by picking off the leading digit is most easily seen by considering the support of W. The probability density function on
the range 1 < w < 2 is being shifted to the left by one unit, as seen in Figure 4.4. The two shaded bars from Figure 4.3 are stacked on top of one another which reach to the dashed line in Figure 4.4 that provide the basis for the conformance to Benford’s Law.
All of the examples of random variables W that satisfy Benford’s Law have been symmetric distributions. We now consider the case of a nonsymmetric distribution of W that satisfies Benford’s Law exactly. Let W ∼ triangular(0, 1, 3). In this case, the probability density function of W is
(p.117) The distribution of T = 10^{W} has probability density function
Figure 4.5 shows the probability density function of W on the left-hand graph and the probability density function of T on the right-hand graph. The shaded areas on the graphs correspond to the probability that the leading digit is 4. Since three orders of magnitude are spanned by the support of T, there are three ranges (4 ≤ T < 5, 40 ≤ T < 50, and 400 ≤ T < 500) that result in having Y = 4 as a leading digit. The geometry associated with what is happening by picking off the leading digit is most easily seen by considering the support of W. The probability
density function on the range 1 < w < 2 is being shifted to the left by one unit, and the probability density function on the range 2 < w < 3 is being shifted to the left by two units, as seen in Figure 4.6. The probabilities associated with a leading digit of Y = 4, or any other leading digit for that matter, correspond to the rectangle of height 1 in Figure 4.6.
All of the examples in this section can be viewed through a different lens. Instead of shifting orders of magnitude associated with W onto the unit interval, the shifting can be considered to be a finite mixture model (see for instance [McLPe]). Consider the W ∼ triangular(0, 1, 2) example. This is equivalent to two probability density functions, namely
and
which are mixed together with equal probabilities. In this case
Since W ∼ U (0, 1), this implies that W corresponds to a random variable T = 10^{W} that satisfies Benford’s Law. (p.118)
4.4 Conclusions
Benford’s Law is approximated to varying degrees for common parametric distributions. An infinite array of probability distributions can be constructed, however, that satisfy Benford’s Law exactly. The geometry discussed in this chapter works with W = log_{10} T, which must be U(0, 1) after shifting to account for the various orders of magnitude, to satisfy Benford’s Law. The distribution of W, with odd-numbered leading digits shaded, is shown in Figure 4.7.
Notes:
(^{1}) Department of Mathematics, The College of William & Mary, Williamsburg, VA 23187.