Financial statement errors: evidence from the distributional properties of financial statement numbers

Amiram, Dan; Bozanic, Zahn; Rouen, Ethan

doi:10.1007/s11142-015-9333-z

Financial statement errors: evidence from the distributional properties of financial statement numbers

Published: 02 August 2015

Volume 20, pages 1540–1593, (2015)
Cite this article

Review of Accounting Studies Aims and scope Submit manuscript

Dan Amiram¹,
Zahn Bozanic² &
Ethan Rouen¹

7959 Accesses
107 Citations
6 Altmetric
1 Mention
Explore all metrics

An Erratum to this article was published on 01 September 2015

Abstract

Motivated by methods used to evaluate the quality of data, we create a novel firm-year measure to estimate the level of error in financial statements. The measure, which has several conceptual and statistical advantages over available alternatives, assesses the extent to which features of the distribution of a firm’s financial statement numbers diverge from a theoretical distribution posited by Benford’s Law. After providing intuition for the theory underlying the measure, we use numerical methods to demonstrate that certain error types in financial statement numbers increase the deviation from the theoretical distribution. We corroborate the numerical analysis with simulation analysis that reveals that the introduction of errors to reported revenue also increases the deviation. We then provide empirical evidence that the measure captures financial statement data quality. We first show the measure’s association with commonly used measures of accruals-based earnings management and earnings manipulation. Next, we demonstrate that (1) restated financial statements more closely conform to Benford’s Law than the misstated versions in the same firm-year and (2) as divergence from Benford’s Law increases, earnings persistence decreases. Finally, we show that our measure predicts material misstatements as identified by SEC Accounting and Auditing Enforcement Releases and can be used as a leading indicator to identify misstatements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Trivialization of the bottom line and losing relevance of losses

Article Open access 08 August 2023

Earnings Management, Auditor Changes and Ethics: Evidence from Companies Missing Earnings Expectations

Article 19 June 2023

NPO Financial Statement Quality: An Empirical Analysis Based on Benford’s Law

Article 11 August 2015

Notes

Distributions need to be nontruncated or uncensored to conform to Benford’s Law. For example, a petty cash account with a reimbursement limit of $25 would not be expected to follow Benford’s Law.
Please see Appendix 1 for the full theoretical distribution specified by Benford’s Law.
An intuitive but mathematically inaccurate way to briefly describe the intuition for Benford’s Law is as follows. When cumulating numbers from 0, we will reach 100 before we reach 200 and 200 before we reach 300 and so forth. In the same way, the concept is scale independent, i.e., we are also going to reach 1,000,000 before we get to 9,000,000 and so forth. Given that we will stop at a random point each time we cumulate, the process will reach lower first digits (e.g., 1’s and 2’s) more often than higher leading or first digits (e.g., 8’s and 9’s).
For example, the preparer needs to estimate what the returns and rebates on sales will be, as well as sales bonuses, tax payments, and so forth. If there is no error (intentional or otherwise) in the reported numbers, these items should follow Benford’s Law. However, if these estimates contain certain type of errors, as the error increases, the estimates will likely diverge further from Benford’s Law.
Our claim that there is not likely to be an ex ante relation with underlying firm characteristics or business models does not imply the absence of a spurious correlation ex post. For example, because firms with lower profitability may be more likely to manipulate their financial statements, our measure may be spuriously correlated with profitability, despite our claim that it is not theoretically related to a firm’s profitability ex ante. Unfortunately, like any other measure that bears resemblance to an exogenous instrumental variable, this lack of correlation cannot be tested (Wooldridge 2010).
Please refer to Sect. 3.4 and Appendix 2 for further detail.
We follow Dechow et al. (2011) and use the term “material misstatement” to refer to SEC AAERs, which the SEC itself refers to as “alleged fraud.”
This pattern is consistent with Dechow et al. (2011), who show an increase in abnormal accruals and a higher probability of manipulation in the years leading up to a material misstatement.
Strategic rounding has also been documented by Grundfest and Malenko (2009).
Benford’s Law has also been employed in auditing software, such as ACL. However, similar to prior research, its use has been limited to internal transactional data on a digit-by-digit (not distributional) basis. To our knowledge, prior to this paper, no commercial auditing software computes the conformity of the entire distribution of first digits, nor assesses firm-year conformity from external corporate financial reports.
Another method to examine conformity relies on the expected distribution of the first two digits (from 10 to 99) of a number (Nigrini 2012). We cannot employ the first two digits in our setting because the number of buckets required to generate the distribution is 90 (i.e., leading two digits, 10–99) instead of 9 (i.e., leading first digits 1–9).
While two other statistics, the Z-statistic and the Chi square statistic, were widely used in the early stages of the forensic accounting literature, in this area researchers have progressed to using the MAD statistic (Cleary and Thibodeau 2005; Nigrini 2012). The main deficiency of using the Z-statistic to examine Benford’s Law is that it examines conformity of only a single digit at a time, rather than the composite distribution of digits. The main deficiencies of using the Chi-square statistic is that, unlike the MAD statistic, it assumes observational independence and, similar to the KS statistic, is sensitive to the pool of digits used.
We do, however, exclude data items provided by Compustat that do not appear on firms’ financial statements, e.g., price data. Furthermore, while we would prefer to use the Edgar 10-K filing itself to overcome possible Compustat shortcomings (e.g., missing variables, modified definitions, etc.), extracting the current year’s financial statements from a given 10-K presents technological obstacles that make automated extraction infeasible as well as susceptible to its own biases.
The rationale for doing so is to ensure we do not mechanically create measurement error. Including firm-years with fewer than 100 line items does not alter our results.
Removing the materiality condition does not alter our inferences.
As noted previously, unlike the FSD Score based on the KS statistic, the FSD Score based on the MAD statistic has no critical value against which to test. However, based on simulation analysis, Nigrini (2012) suggests, when using the MAD statistic, a value of 0.006 or lower can be considered as close conformity to Benford’s Law.
While we do not claim that all 16 % of the firms that deviate from Benford’s Law engage in material misreporting, this estimate is consistent with Dyck, Morse, and Zingales (2013), who report that the probability of a firm committing fraud is 14.5 % a year.
These results further imply that the pre-errors financial statements follow Benford’s Law because, if most financial statements follow Benford’s Law after-errors, it is likely that firms follow Benford’s Law before errors were introduced.
The modified Jones model becomes statistically insignificant in explaining FSD only after including ABS_WCACC and ABS_RSST as they capture similar constructs.
The addition of an interaction term to the regression between the FSD score, size, and net income does not alter our results.
According to Dechow et al. (2010): “… the SEC has limited resources that constrain its ability to detect and prosecute misstatements. Thus, the SEC may not pursue cases that involve ambiguity and that it does not expect to win. As a result, the AAER sample is likely to contain the most egregious misstatements and exclude firms that are aggressive but manage earnings within GAAP.”
The example can be constructed to include balance sheet and cash flows statement and include multiple periods.
Insider and outsiders do not need to know the means and standard deviations of the original distributions or the error term. They simply need to know that the distribution follows Benford’s Law.
Not all misestimated or fabricated data create deviations from Benford’s Law. For example, if the mis-estimation simply multiplies all true realizations by a constant, the new erroneous data will still follow Benford’s Law.

References

Beneish, M. (1999). The detection of earnings manipulation. Financial Analyst Journal, 55(1999), 24–36.
Article Google Scholar
Benford, F. (1938). The law of anomalous numbers. Proceedings of the American Philosophical Society, 78, 551–572.
Google Scholar
Blodget, H. (2008). Bernie Madoff’s miraculous returns: Month by month. Business Insider, December 12. Accessed August 10, 2013.
Burgstahler, D., & Dichev, I. (1997). Earnings Management to avoid earnings decreases and losses. Journal of Accounting and Economics, 24(1), 99–126.
Article Google Scholar
Bushman, R., & Smith, A (2003). Transparency, financial accounting information, and corporate governance. Federal Reserve Bank of New York Policy Review (April), pp. 65–87.
Carslaw, C. (1988). Anomalies in income numbers: Evidence of goal oriented behavior. The Accounting Review, LXIII(2), 321–327.
Google Scholar
Cleary, R., & Thibodeau, J. C. (2005). Applying digital analysis using benford’s law to detect fraud: The dangers of Type I errors. Auditing A Journal of Practice & Theory, 24(1), 77–81.
Article Google Scholar
Dechow, P., & Dichev, I. (2002). The quality of accruals and earnings: The role of accrual estimation errors. The Accounting Review, 77(supplement), 35–59.
Article Google Scholar
Dechow, P., Ge, W., Larson, C., & Sloan, R. (2011). Predicting material accounting misstatements. Contemporary Accounting Research, 28(1), 17–82.
Article Google Scholar
Dechow, P., Ge, W., & Schrand, C. (2010). Understanding earnings quality: A review of the proxies, their determinants and their consequences. Journal of Accounting and Economics, 50, 344–401.
Article Google Scholar
Dechow, P., & Skinner, D. (2000). Earnings management: Reconciling the views of accounting academics, practitioners, and regulators. Accounting Horizons, 14(2), 235–250.
Article Google Scholar
Duffie, D., & Lando, D. (2001). Term structures of credit spreads with incomplete accounting information. Econometrica, 69(3), 633–664.
Article MATH MathSciNet Google Scholar
Durtschi, C., Hillison, W., & Pacini, C. (2004). The effective use of Benford’s law to assist in the detecting of fraud in accounting data. Journal of Forensic Accounting, 5, 17–34.
Google Scholar
Dyck, A., Morse, A., & Zingales, L. (2013). How pervasive is corporate fraud? Working paper.
Francis, J., LaFond, R., Olsson, P., & Schipper, K. (2004). Cost of equity and earnings attributes. The Accounting Review, 79, 967–1010.
Article Google Scholar
Francis, J., LaFond, R., Olsson, P., & Schipper, K. (2005). The market pricing of accruals quality. Journal of Accounting and Economics, 39, 295–327.
Article Google Scholar
Gallu, J. (2013). SEC to move past financial crisis cases under chairman white. Bloomberg, April 18. Accessed July 15, 2013.
Grundfest, J., and N. Malenko (2009). Quadrophobia: Strategic Rounding of EPS Data. Working paper.
Hill, T. (1995). A statistical derivation of the significant digit law. Statistical Science, 10, 354–363.
MATH MathSciNet Google Scholar
Hill, T. (1996). A note on the distribution of true versus fabricated data. Perceptual and Motor Skills, 83, 776–778.
Article Google Scholar
Jones, J. (1991). Earnings management during import relief investigations. Journal of Accounting Research, 29, 193–228.
Article ADS Google Scholar
Kothari, S. P., Leone, A., & Wasley, C. (2005). Performance matched discretionary accrual measures. Journal of Accounting and Economics, 39, 163–197.
Article Google Scholar
La Porta, R., Lopez-De-Silanes, F., Shleifer, A., & Vishny, R. (2000). Legal determinants of external capital. Journal of Finance, 55(1), 1–33.
Article Google Scholar
Ley, E. (1996). On the peculiar distribution of the US Stock Indexes’ Digits. The American Statistician, 50(4), 311–313.
Google Scholar
Li, F. (2008). Annual report readability, current earnings, and earnings persistence. Journal of Accounting and Economics, 45, 221–247.
Article Google Scholar
McKenna, F. (2012). Is the SEC’s Ponzi Crusade Enabling Companies to Cook the Books, Enron-Style? Forbes, October 18. Accessed July 15, 2013.
McKenna, F. (2013). Where should SEC start a fraud crack down? Maybe look at fake restatements. Forbes.com, June 18. Accessed July 15, 2013.
Michalski, T., & Stoltz, G. (2013). Do countries falsify economic data strategically? Some evidence that they might. The Review of Economics and Statistics, 95(2), 591–616.
Article Google Scholar
Morrow, J. (2010). Benford’s law, families of distributions and a test basis. Working paper.
Nigrini, M. (1996). Taxpayer compliance application of Benford’s law. Journal of American Taxation Association, 18(1), 72–92.
Google Scholar
Nigrini, M. (2012). Benford’s law: Applications for forensic accounting, auditing, and fraud detection. Hoboken, N.J.: Wiley.
Book Google Scholar
Nigrini, M., & Miller, S. (2009). Data diagnostics using second-order test of Benford’s law. Auditing: A Journal of Practice and Theory, 28(2), 305–324.
Article Google Scholar
Owens, E., Wu, J., & Zimmerman, J. (2013). Business model shocks and abnormal accrual models. Working paper.
Pike, D. Testing for the benford property. Working paper (2008).
Pimbley, J. M. (2014) Benford’s law and the risk of financial fraud. Risk Professional (May), 1–7.
Rajan, R., & Zingales, L. (2003a). Financial dependence and growth. American Economic Review, 88(3), 559–586.
Google Scholar
Rajan, R., & Zingales, L. (2003b). The great reversals: The politics of financial development in the twentieth century. Journal of Financial Economics, 69(1), 5–50.
Article Google Scholar
Ray, S., & Lindsay, B. (2005). The topography of multivariate normal mixtures. The Annals of Statistics, 33(5), 2042–2065.
Article MATH MathSciNet Google Scholar
Richardson, S., Sloan, R., Soliman, M., & Tuna, I. (2005). Accrual reliability, earnings persistence and stock prices. Journal of Accounting and Economics, 39, 437–485.
Article Google Scholar
Shumway, T. (2001). Forecasting bankruptcy more accurately: A simple hazard model. The Journal of Business, 74, 101–124.
Article Google Scholar
Smith, S. (1997). The scientist and Engineer’s guide to digital signal processing. California: California Technical Publications.
Google Scholar
Thomas, J. (1989). Unusual patterns in reported earnings. The Accounting Review, 5(4), 773–787.
Google Scholar
Varian, H. (1972). Benford’s Law. American Statistician, 23, 65–66.
Google Scholar
Whalen, D., Cheffers, M., & Usvyatsky, O (2013). 2012 Financial Restatements: A Twelve Year Comparison. Audit Analytics.
Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. Cambridge, MA: MIT press.
MATH Google Scholar

Download references

Acknowledgments

We would like to thank Patty Dechow (the editor), an anonymous referee, Anil Arya, Rob Bloomfield, Qiang Cheng, Ilia Dichev, Dick Dietrich, Peter Easton, Paul Fischer, Ken French, Joseph Gerakos (CFEA discussant), Jon Glover, Trevor Harris, Colleen Honigsberg, Jeff Hoopes, Gur Huberman, Amy Hutton, Bret Johnson, Steve Kachelmeier, Alon Kalay, Bin Ke, Bill Kinney, Alastair Lawrence, Melissa Lewis-Western, Scott Liao, Sarah McVay (FARS discussant), Rick Mergenthaler, Brian Miller, Brian Mittendorf, Suzanne Morsfield, Suresh Nallareddy, Jeff Ng, Craig Nichols, Mark Nigrini, Doron Nissim, Ed Owens, Bugra Ozel, Oded Rozenbaum, Gil Sadka, Richard Sansing, Richard Sloan, Steve Smith, Steve Stubben, Alireza Tahbaz-Salehi, Dan Taylor, Andy Van Buskirk, Kyle Welch, Jenny Zha (TADC discussant), Amir Ziv, conference participants at the 2014 AAA FARS Midyear Meeting, and workshop participants at Columbia University, Baruch College, Dartmouth College, Florida Atlantic University, George Washington University, Georgetown University, the London Trans-Atlantic Doctoral Conference, Nanyang Technological University, Singapore Management University, Syracuse University, UC Berkeley, UCLA, UNC, The University of Texas—Austin, and The University of Utah for their helpful comments and suggestions. We would also like to thank the PCAOB and the SEC for their insights.

Author information

Authors and Affiliations

Columbia Business School, Columbia University, New York, NY, USA
Dan Amiram & Ethan Rouen
Fisher College of Business, The Ohio State University, Columbus, OH, USA
Zahn Bozanic

Authors

Dan Amiram
View author publications
You can also search for this author in PubMed Google Scholar
Zahn Bozanic
View author publications
You can also search for this author in PubMed Google Scholar
Ethan Rouen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zahn Bozanic.

Appendices

Appendix 1

How to calculate conformity to Benford’s Law, an empirical example

Assets		Liabilities
Cash	1364	Accounts payable	1005
Accounts receivable	931	Short-term loans	780
Inventory	2054	Income taxes payable	31
Prepaid expenses	1200	Accrued salaries and wages	37
Short-term investments	38	Unearned revenue	405
Total short-term assets	5587	Current portion of long-term debt	297
		Total short-term liabilities	2555
Long-term investments	1674
Property, plant, and equipment	4355	Long-term debt	6507
(Less accumulated depreciation)	2215	Deferred income tax	189
Intangible assets	608	Other	587
Other	84
		Total liabilities	9838
Total assets	14,523
		Equity
		Owner’s investment	1118
		Retained earnings	2732
		Other	835
		Total equity	4685
		Total liabilities and equity	14,523

Above is a sample balance sheet. To test its conformity to Benford’s Law, take the first digit of each number (in bold) and calculate the frequency of the occurrence of each digit. In this case, there are 28 total numbers and eight appearances of the number 1, so its frequency is 8/28 = 0.2857.

Next, compare the empirical distribution to Benford’s theoretical distribution:

Digit	1	2	3	4	5	6	7	8	9
Total occurrences	8	5	3	3	2	2	1	2	2
Empirical distribution	0.2857	0.1786	0.1071	0.1071	0.0714	0.0714	0.0357	0.0714	0.0714
Theoretical distribution	0.3010	0.1761	0.1249	0.0969	0.0792	0.0669	0.0580	0.0512	0.0458

The Mean Absolute Deviation (MAD) statistic and the Kolmogorov–Smirnov (KS) statistic can be computed to test the conformity of the empirical distribution to Benford’s distribution.

1.
The KS statistic is calculated as follows:

$$ {\text{KS}} = Max(\left| {{\text{AD}}_{1} - {\text{ED}}_{1} } \right|,\left| {\left( {{\text{AD}}_{1} + {\text{AD}}_{2} } \right) - \left( {{\text{ED}}_{1} + {\text{ED}}_{2} } \right)} \right|, \ldots ,\left| {\left( {{\text{AD}}_{1} + {\text{AD}}_{2} + \cdots + {\text{AD}}_{9} } \right) - \left( {{\text{ED}}_{1} + {\text{ED}}_{2} + \cdots + {\text{ED}}_{9} } \right)} \right| $$

where AD (actual distribution) is the empirical frequency of the number and ED (expected distribution) is the theoretical frequency expected by Benford’s distribution.

In this example,

$$ \begin{array}{*{20}l} {Max(\left| {0.2857 - 0.3010} \right|, \, \left| {\left( {0.2857 + 0.1786} \right) - \left( {0.3010 + 0.1761} \right)} \right|, \ldots ,} \hfill \\ {(|\left( {0.2857 + 0.1786 + 0.1071 + 0.1071 + 0.0714 + 0.0714 + 0.0357 + 0.0714 + 0.0714} \right)} \hfill \\ { - \left( {0.3010 + 0.1761 + 0.1249 + 0.0969 + 0.0792 + 0.0669 + 0.0580 + 0.0512 + 0.0458} \right)|) = 0.0459} \hfill \\ \end{array} $$

To test conformity to Benford’s distribution at the 5 % level based on the KS statistic, the test value is calculated as 1.36/√P, where P is the total number, or pool, of first digits used. The test value for the sample balance sheet is 1.36/√28 = 0.2570. Since the calculated KS statistic of 0.0459 is less than the test value, we cannot reject the null hypothesis that the empirical distribution follows Benford’s theoretical distribution.

2.
The MAD statistic is calculated as follows:

MAD = (∑ ^K_i=1 |AD-ED|)/K, where K is the number of leading digits being analyzed.

In this example,

(|0.2857 − 0.3010| + |0.1786 − 0.1761| + |0.1071 − 0.1249| + |0.1071 − 0.0969| + |0.0714 − 0.0792| + |0.0714 − 0.0669| + |0.0357 − 0.0580| + |0.0714 − 0.0580| + |0.0714 − 0.0458|)/9 = 0.0140.

Since the denominator in MAD is K, this statistic is insensitive to scale (the pool of digits used, or P). This statistic becomes more useful as the total pool of first digits increases, while the KS statistic become more sensitive as P increases.

Note that there are no determined critical values to test the distribution using MAD.

Appendix 2: Theoretical underpinnings of the FSD Score

There are two mathematical facts that explain the prevalence of Benford’s Law in empirical data. First, it can be shown that the mantissa (the fraction behind the decimal point of an integer) of the log 10 of a number is what determines the first digit of that number. If the mantissa is between log(d + 1) and log(d), where d is an integer between 1 and 9, then the original number will start with d. Second, since many distributions observed in nature and all of those that are characterized by Hill’s (1995) theorem, are smooth and symmetric in the log scale (because of variations of the Central Limit Theorem), the probability of being in a region between n + log(d + 1) and n + log(d), where n is any integer in the logarithmic distribution, is exactly log(d + 1) − log(d). This is precisely the probability given by Benford’s Law. We detail this intuition in the following subsections.

2.1 Determining the first digit of a number

The first fact that mathematically explains the prevalence of Benford’s Law is that we can obtain the leftmost (or first) digit of a positive number by using the following algorithm (Smith 1997; Pimbley 2014). First, calculate the base 10 log of the number. For example, the base 10 log of 7823.22 is 3.893. Second, isolate the mantissa, that is, the part of the number to the right of the decimal point; in our example, it will be 0.893. Third, raise 10 to the power of the mantissa found in the prior step; in our example, 10^0.893 is 7.81. Fourth, the integer of the number found in the prior step is the first digit of the original number. In our case, the integer of 7.81 is 7, which is indeed the first digit of our original number 7823.22.

This algorithm shows that the first digit of a number can be recovered from the remainder (or mantissa) of its base 10 log. More formally, any number N will start with the digit d (where d is between 1 and 9) if and only if the mantissa of log(N) is between log(d + 1) and log(d). This means that N will start with 1 if the mantissa of the log of N is between log(2) = 0.301 and log(1) = 0. The number N will start with 2 if the mantissa of the log of N is between log(3) = 0.477 and log(2) = 0.301, and so forth. The advantage of this algorithm is that it takes numbers with any length and isolates them to a length of only one digit. Furthermore, as this example shows, the differences of log(d + 1) − log(d) for digits 1 through 9, which determine the intervals between the first digits, are exactly the probabilities that a first digit will be d as defined by Benford’s Law, which leads us to the second mathematical fact.

2.2 Probability distribution functions and the area under the curve: uniform distributions

The second mathematical fact that empirically determines the prevalence of the first digit 1 and the rarity of the first digit 9 is that the area under the curve of a probability density function (PDF) is the probability that a number drawn from this distribution will be in this range. To demonstrate the mechanics of this fact, it is convenient to examine the first digits on the log 10 scale rather than the linear scale. Therefore, we initially consider a uniform distribution between 0 and 6 on the log scale (which implies that the distribution ranges from 1 to 1 million on a linear scale). The PDF of this distribution is PDF(log(N)) = 1/6, and the graphical representation is:

The solid black bars in above figure are the areas under the curve between every integer n in the distribution and n + log(2) = n + 0.301. If N is a random number drawn from this distribution and falls in any of these areas, it will begin with the number 1 in the linear scale. The reason is that, according to the algorithm discussed in the previous section, any number that is between an integer n and n + 0.301 in the log scale will start with 1 in the linear scale because its mantissa is between 0 and 0.301.

To obtain the probability that a number from this distribution (uniform in the log scale) will start with the digit 1 in the linear scale, we must find the area under the curve between n and n + log(2). We can obtain this by taking the integral of the PDF between n and n + log(2). Thus, the probability that a first digit, d, is 1 can be expressed as:

$$ \begin{aligned} \mathop \sum \limits_{n = 0}^{5} \mathop \smallint \limits_{n}^{{n + { \log }\left(2 \right)}} \frac{1}{6}dN &= 1/6*\left({0.301 - 0} \right) + 1/6*\left({1.301 - 1} \right) + 1/6*\left({2.301 - 2} \right) \hfill \\ &\quad+ 1/6*\left({3.301 - 3} \right) + 1/6*\left({4.301 - 4} \right) + 1/6*\left({5.301 - 5} \right) \hfill \\ &= 0.301 = {\mathbf{log}}\left({\mathbf{2}} \right){-}{\mathbf{log}}({\mathbf{1}}) \hfill \\ \end{aligned} $$

The same rationale applies for every first digit d where d can equal 1 to 9. That is, if N is distributed uniformly in the log scale, it will follow Benford’s Law because t probability of obtaining the first digit d is exactly log(d + 1) – log(d), which is Benford’s Law. More formally, in the case of our uniform distribution:

$$ \begin{aligned} &\mathop \sum \limits_{n = 0}^{5} \mathop \smallint \limits_{{n + { \log }\left(d \right)}}^{{n + { \log }\left({d + 1} \right)}} \frac{1}{6}dN \hfill \\ &= {\mathbf{log}}\left({{\mathbf{d}} + {\mathbf{1}}} \right){-}{\mathbf{log}}\left({\mathbf{d}} \right) \hfill \\ \end{aligned} $$

2.3 Probability distribution functions and the area under the curve: normal distributions

While the uniform distribution is useful in explaining the intuition, it is not as useful when applying the intuition to empirical data. Two types of distributions arise naturally in many processes because of variations of the Central Limit Theorem, the normal and log-normal distributions. The intuition above applies in these cases as well. As long as these distributions are spread across a few orders of magnitudes in the log scale (e.g., range between 2 and 4 in the log-scale, which translates to 100–10,000 in the linear scale), they will follow Benford’s Law.

To see this clearly, we need to examine a distribution that is distributed normally on the log scale, which means it is log normal in the linear scale. (The distinction between natural log or base 10 log is not crucial here for the shape of the distribution.) Consider a normal distribution with a mean of 5 and standard deviation of 1 in the log scale.

$$ {\text{PDF}}\left({{\text{Log}}\left(N \right),\mu = 5,\sigma = 1} \right) = \frac{1}{{\sqrt{2\pi}}}{\rm e}^{{- \frac{{(x - 5)^{2}}}{2}}} $$

The shaded area in above figure represents all the areas between any integer n and n + 0.301. While it is not clear to the naked eye as it was in the case of the uniform distribution above, the area under the curve in all sections between n and n + 0.301 is the probability of a number in a linear scale starting with 1. Here, the probability that a first digit is 1 is:

$$ \sum\limits_{n = - \infty}^{\infty} {\int\limits_{n + \log (1)}^{n + \log (2)} {\frac{1}{{\sqrt {2\pi}}}e^{{- \frac{{(x - 5)^{2}}}{2}}}}} dN \cong {\mathbf{0}}.{\mathbf{301}} = {\mathbf{log}}\left({\mathbf{2}} \right)-{\mathbf{log}}\left({\mathbf{1}} \right) $$

Similarly, we can find the probability of any digit for this normal distribution in the following way:

$$ \sum\limits_{n = - \infty}^{\infty} {\int\limits_{n + \log (d)}^{n + \log (d + 1)} {\frac{1}{{\sqrt {2\pi}}}e^{{- \frac{{(x - 5)^{2}}}{2}}}}} dN \cong {\mathbf{log}}\left({{\mathbf{d}} + {\mathbf{1}}} \right)-{\mathbf{log}}\left({\mathbf{d}} \right). $$

2.4 Probability distribution functions and the area under the curve: generic distributions

More generally, for any given probability distribution function, the probability that a first digit begins with d can be found by obtaining the area under the curve for the function specified:

$$ \sum\limits_{n = - \infty}^{\infty} {\int\limits_{n + \log (d)}^{n + \log (d + 1)} {PDF(\log (N))}} dN $$

For a given digit d, if the area under the curve is equal to log(d + 1) − log(d), then the probability that the first digit for the numbers drawn from this distribution is d will follow Benford’s Law. Stated differently, if a distribution is smooth and symmetric in the log scale over several orders of magnitude, it will follow Benford’s Law (Smith 1997; Pimbley 2014). This happens because the area under the curve from n + log(d) to n + log(d + 1) is equal to log(d + 1) − log(d), which is equal to the probability that a first digit is d under Benford’s Law. Since many empirical distributions tend to be smooth and symmetric in the log scale, it is not surprising that first digits are empirically distributed following Benford’s Law.

2.5 Mean absolute deviation and financial statement deviation

It is not sufficient to examine only a single digit in isolation to detect deviation from Benford’s Law (Smith 1997). A natural measure to examine the distance of all leading digits from Benford’s Law is the Mean Absolute Deviation (MAD), which takes the mean of the absolute value of the difference between the empirical frequency of each leading digit that appears in the distribution and the theoretical frequency specified by Benford’s Law. We can now construct the Financial Statement Deviation (FSD) Score based on the Mean Absolute Deviation (MAD) statistic:

The FSD Scores of the uniform and log-normal scale PDFs above are equal to zero. This occurs because, as shown above, since these distributions are smooth and symmetric, the probability that a number drawn from any of these distributions begins with a digit d is log(d + 1) − log (d), which is exactly the probabilities given by Benford’s Law. Therefore, for each first digit d, there is no deviation from Benford’s Law, which implies that the mean of the absolute deviation, as captured by the FSD Score, is equal to zero.

Appendix 3: A stylized numerical model

To strengthen the intuition regarding the way Benford’s Law can be used to detect errors in accounting data, consider the following setting. A manager starts a project at year 1 that has a vector X with K {1,2,…K} different random cash flow streams X_k {X₁, X_2… X_K}. All cash flow streams will be realized in year 2 and are constructed to be positive (i.e., we take the absolute value of the cash flow streams). X₁ is the random flow of cash from activity 1 (say, cash flow from revenue from activity 1), X₂ is the random flow of cash from activity 2 (say, cash outflow for payment to suppliers), and X_k is the random inflow of cash from activity k. X_K is the last cash flow stream.^{Footnote 22} Assume that the K cash flows are all log normal (base 10) distributed with mean µ_k and standard deviation of σ_k (in the log scale), which implies that log(X_k) is distributed normal (µ_k, σ_k). For simplicity, we will assume all cash flows and error terms are uncorrelated with each other and we will modify this assumption later in our illustration.

At the end of year 1, the manager needs to report financial statements that include his estimate of the cash flow stream X. This report could be the manager’s best estimate, could be strategically manipulated, or could be constrained by correct application of accounting methods; we do not distinguish between these possibilities. The report is a vector Y with K different estimates for each of the K cash flows. To make the calculation tractable, assume that Y_k = X_k*Z_k, where Z is a vector of the estimation errors for each of the X_k. If Z_k = 1, there is no error in the estimation. If Z _k > 1, there is over-estimation of the true X_k, and if Z_k < 1, there is under-estimation of X_k. The reason for the multiplicative error structure, rather than the more common additive error structure, is that we can now easily recast the example in log scale as log(Y_k) = log(X_k) + log(Z_k), that is, there is an additive error in the log scale, which makes the problem more tractable. Since log(1) is zero, it is clear that if there is no error, Z_k = 1, and log(Y_k) = log(X_k).

Since we showed above that normal distributions in the log scale follow Benford’s Law, adding an error term Z_k that is distributed log normal with a mean µ_εk and standard deviation σ_εk does not create deviation from the law. The reason is that the convolution in the log scale of Y (i.e., the distribution of log(X_k) + log(Z_k)) will be distributed normal (µ_{k +} µ_εk, σ_{k +} σ_εk). This distribution will also follow Benford’s Law, even if there is a nonzero mean error (µ_εk ≠ 0) or decreased precision (σ_εk > 0).

However, the example becomes more interesting when we look at the errors in the report in a specific year (i.e., when we look at the distribution of the cross section of all the X_ks in 1 year). The reason is that, despite the fact that all X_ks in a given year are distributed normally, the mixture distribution of the vector X for that year will not be normally distributed unless the means of the underlying distributions are equal. The distribution of the vector X in the cross section is a mixture distribution, and its density function is given by the following formula:

$$ {\text{PDF}}\left(X \right) = \sum\nolimits_{k = 1}^{K} {W_{k} *PDF\left({X_{k}} \right),} $$

where W_k is the weight of each of the individual distributions that comprise the mixture distribution. In our case, since the X_ks are distributed normally in the log scale, the mixture distribution is given by the following expression:

$$ {\text{PDF}}\left({{\rm log}\left(X \right)} \right) = \sum\limits_{k = 1}^{K} {\frac{1}{K}\left({\frac{1}{{\sigma_{k} \sqrt {2\pi}}}e^{{- \frac{{(x - \mu_{k})^{2}}}{{2\sigma_{k}^{2}}}}}} \right)} $$

The theoretical FSD Score of X (in the cross section) in this case is therefore:

$$ {\text{FSD Score}} = \frac{{\sum}_{d = 1}^{9} {{\rm ABS}\left[\left({\sum}_{n = - \infty}^{\infty} {\int\limits_{n+ \log (d)}^{n + \log (d + 1)} {{\sum}_{k = 1}^{K}{\frac{1}{K}\left(\frac{1}{{\sigma_{k} {\sqrt {2\pi}}}}e^{{-\frac{{(x - \mu_{k})^{2}}}{{2\sigma_{k}^{2}}}}}\right)}}}dX_{k}\right) - ({\log}(d + 1) - {\log}(d))\right]}}{9}.$$

A mathematically interesting fact about the mixture of normal distributions is that when the means of the distributions are less than two standard deviations apart, the resulting distribution has a single peak, and it looks exactly like a normal distribution (Ray and Lindsay 2005). Therefore, it will follow Benford’s Law. More importantly, Hill (1995) provides a proof that mixtures of distributions that do not contain error will follow Benford’s Law under certain conditions. However, there is no analytical or empirical way to show that these conditions are met in the context of financial accounting. We do, however, show that the distribution of Y in the log scale appears to be relatively smooth and symmetric (and looks similar to a normal distribution). Figure 1a plots the empirical density function of all numbers from all financial statements from 2001 to 2011 in the log scale, which suggests that the underlying no-error distribution follows Benford’s Law as well. Figure 1b shows the distribution in the log scale for a typical firm, Alcoa in 2011.

Solving for a general closed-form solution of how the FSD Score is changing with the error term Z is beyond the scope of this paper and therefore we leave this question for future analytical research. However, we now extend the analysis and use numerical parameters for specific cases to show the intuition of how FSD changes.

3.1 A special numerical solution

Assume there are 10 groups of cash flow streams (i.e., K = 10, so we have X₁ to X₁₀ cash flow streams) and that each of the cash flow streams has a different mean in the log scale, starting from 4 to 4.9, separated by 0.1 (i.e., µ₁ = 4, µ₂ = 4.1.., µ₁₀ = 4.9), which means the numbers range from 10,000 to 100,000 in the linear scale. Finally, assume that the standard deviation of each of the X_ks in log scale is σ_k = 1.

The probability density function of X, that is, the mixture distribution in this year, is therefore the following: PDF (log(X)) = $ \sum\limits_{k = 1}^{10} {\frac{1}{10}\left(\frac{1}{{\sqrt {2\pi}}}e^{{- \frac{{(x - \mu_{k})^{2}}}{2}}}\right)} $. As can be seen in the figure below, this distribution is smooth and symmetric and looks similar to a normal distribution:

Furthermore, this distribution follows Benford’s Law, and the FSD Score for this distribution under those parameters is FSD Score = 0.

The problem is that X is unobservable to an outsider (and may also be unobservable to the manager). The outsider is observing only the Y_ks where Y_k = X_k*Z_k. The conclusions about the errors that outsiders can make must come from the distribution of the reported vector of numbers Y.^{Footnote 23} If Z_k is distributed log normal, which means it is distributed normal in the log scale with µ_εk and σ_εk, then each Y_k is also distributed normal in the log scale with parameters µ_yk = µ_k + µ_εk and σ_yk = σ_k + σ_εk. This is essentially the distribution of the sum of two normal variables. Now consider the following three cases.

3.2 Error distributions with equal means and equal standard deviations (Case 1)

In this case, µ_ε1 = µ_ε2 = … = µ_ε10 = Constant C and σ_ε1 = σ_ε2 = … = σ_ε10 = Constant S. In this case, µ_yk = µ_k + C and σ_yk = σ_k + S. The resulting mixture distribution of Y in the log scale will again look like the distribution of X but shifted to the right by a constant C and flatter because of the increased standard deviation, that is,

$$ {\text{PDF}}\left({{ \log }\, (Y )} \right) = \sum\limits_{k = 1}^{K} {\frac{1}{K}\left({\frac{1}{{(\sigma_{k} + s){\sqrt {2\pi}}}}e^{{- \frac{{(x - \mu_{k} + C)^{2}}}{{2(\sigma_{k} + S)^{2}}}}}} \right)}, $$

which will follow Benford’s Law to a similar degree as the distribution of X. This is because multiplying a distribution that follows Benford’s Law in the linear scale by a constant creates a distribution that follows Benford’s Law (Hill 1995). With parameters C = 0.5 and S = 0.01, the FSD Score of the resulting distribution is zero, and its PDF is shown in the figure below:

In conclusion, adding identical error terms to all the X_ks does not create deviations from Benford’s Law.

3.3 Error distributions with equal means but different standard deviations (Case 2)

In this case, µ_ε1 = µ_ε2 = … = µ_ε10 = Constant C and σ_εk varies across the ks. Therefore, µ_yk = µ_k + C and σ_yk = σ_k + σ_εk. The resulting mixture distribution of Y in log scale will again look like the distribution of X but wider because of the increased standard deviation. Still, it will closely follow Benford’s Law. Here again the FSD Score is zero.

3.4 Error distributions with different means but constant standard deviation (Case 3)

In this case, µ_εk varies across the ks and σ_ε1 = σ_ε2 = … = σ_ε10 = Constant S. This is the interesting case as it will create deviations from Benford’s Law. We consider three different subcases.

3.5 Error in the estimation of a single element in the cash flow streams (Case 3A)

We start with the simple case where we change only the µ_ε10 to add error to X₁₀, which is the highest number in our cash flow streams. We will start increasing µ_ε10 by increments of 0.1. Therefore µ_yk will grow from 4.9 to 5 in the first iteration, to 5.1 in the next iteration, and so on. This situation could be an example of overestimating revenues. The graphical evidence on the way the mixture distribution changes and the resulting FSD Scores is striking for the case of S = 0.01. In the case of µ_ε10 = 0.1 and S = 0.01, the FSD Score is 0.008, and the resulting distribution is shown in the figure below (left) . In the case of µ_ε10 = 0.5 and S = 0.01, the FSD Score is 0.017, and the resulting distribution is shown in the figure below (right).

As we increase the mean of the error, under these parameters, the distribution monotonically moves further away from Benford’s Law and reaches a limit. This case is consistent with managing revenue upward (or overestimating revenue compared to the actual distribution) leading to deviations in Benford’s Law and an increase in the FSD Score.

3.6 The case where the errors are correlated with each other (Case 3B)

The case above represents an error in one element of the report. However, a feature of the accounting system is that an error in one element leads to errors in other elements as well. For example, if the manager overestimates revenue, he is also likely to overestimate cost of goods sold (in an amount less than revenue) to match the revenue and will overestimate the related tax payment (in an amount less than revenue). In the terms of our example, there will be a mean error in several of the Z_ks. For example, let us assume µ_ε10 is increasing by increments of 0.1 as before, but now µ_ε5 = 0.5µ_ε10 and µ_ε1 = 0.1µ_ε10. Again, it is clear from the shape of the graph and the change in FSD that this will cause a significant deviation from Benford’s Law.

In the case of µ_ε10 = 0.1, µ_ε5 = 0.5µ_ε10, µ_ε1 = 0.1µ_ε10, and S = 0.01, the FSD Score is 0.009, and the resulting distribution is shown in the figure below (left). In the case of µ_ε10 = 0.5, µ_ε5 = 0.5µ_ε10, µ_ε1 = 0.1µ_ε10, and S = 0.01, the FSD Score is 0.017, and the resulting distribution is shown in the figure below (right).

Once again, the point to make from this exercise is that deviation from Benford’s Law, under these parameters, is monotonically increasing with the error and reaches a limit, even when the errors are correlated with each other.

3.7 The case where the errors are correlated with the mean of the cash flow streams (Case 3C)

It also possible that the estimation errors may be larger for items that are larger. In terms of our example, µ_εk is a function of µ_k. For the sake of simplicity, assume µ_{εk =} µ_{k *} B, where B is a constant multiplier that determines the error size (the larger is B, the larger is the error). It is clear that, if B is zero, we revert to Case 1, and the distribution follows Benford’s Law exactly with FSD Score equal to 0. However, when we start increasing B by increments of 0.1, the distributions start to change. In the case of µ_{εk =} µ_{k *} B, B = 1.1, and S = 0.01, the FSD Score is 0.004, and the resulting distribution is shown in the figure below (left). In the case of µ_{εk =} µ_{k *} B, B = 1.5, and S = 0.01, the FSD Score is 0.016, and the resulting distribution is shown in the figure below (right).

In this case, uneven errors across accounts create deviations from Benford’ Law that, under these parameters, monotonically increase the FSD Score before reaching a limit.

Appendix 4: Numerical example when realizations are observable

To see the intuition for why deviations from Benford’s Law can be used to assess data quality using real world data, consider the following example. The market value of equity at the end of a trading day is one realization of a random distribution. A sample of different firms in a random day is likely to fit the criteria in Hill (1995). Indeed, consistent with Hill (1996), when examining a random sample of the market value of equity of companies traded in the United States, the distribution follows Benford’s Law. Now assume that instead of measuring the market value of equity accurately by transaction price (where we can observe true realizations), the actual realizations are unknown. Therefore, the data provider has to use estimation techniques (for example, using last year’s prices times the average return from 2 years ago, or just randomly choosing based on a possible distribution of prices). Errors in the estimation techniques or fabricated data (random or human) are likely to create a very different dataset from the true realized distribution and hence create a deviation from Benford’s Law.^{Footnote 24} Therefore, the deviation from Benford’s Law can be used as a proxy for how divergent a dataset is from the true, unobservable realizations. If the realization is known and can be measured with complete accuracy, then there is obviously no need to use Benford’s Law to validate the data. However, in this case, since the realizations are known, we can observe the actual deviation from the true distribution. Below, we illustrate this with real data.

We look at the market value of equity (MVE) for all firms with available data in CRSP’s monthly file (price and shares outstanding) for a random day, August 31, 2011, to build intuition for why Benford’s Law can be used to assess data quality. MVE (price * shares outstanding) is a random distribution, and as expected, the FSD Score for MVE for all firms (created using the distribution of the first digits of all firms with available data) is 0.00295, which can be considered close conformity to Benford’s Law.

Next, we ask, what if the true market price is unknown, and instead, MVE needs to be estimated or is fabricated? To answer this question, we introduce a noise term that changes MVE, where firm-level MVE is equal to MVE * (1 + a randomly generated number from a normal distribution) and then re-measure the FSD Score. We manipulate the mean of the random number (i.e., the estimation error) first, with the expectation that, as the size of the noise increases, deviation from Benford’s Law should also increase. We next keep the mean constant and manipulate the variance, expecting the FSD Score to remain constant since we are no longer changing the magnitude of the noise.

As can be seen below, holding the variance constant, when we increase the mean noise term, the FSD Score increases.

Constant variance	MVE FSD score
Mean = 1, var = 1	0.00294
Mean = 2, var = 1	0.00304
Mean = 3, var = 1	0.00320
Mean = 4, var = 1	0.00322

In contrast, holding the mean noise term constant, when we increase the variance, the FSD Score remains stable.

Constant mean	MVE FSD
Mean = 1, var = 2	0.00292
Mean = 1, var = 3	0.00293
Mean = 1, var = 4	0.00292

These results provide insights into why Benford’s Law and the FSD Score can be used to assess the quality of data in financial statements. Financial statement numbers require significant estimation by management. Investors (and even possibly managers) do not observe the true realization of these numbers. Much like changing the mean around the noise term in the MVE example, as estimation error increases in estimating financial statement numbers, we expect the FSD Score to increase as well.

Appendix 5: Simulation analysis

To demonstrate how a firm’s potential manipulation of its financial results could alter its conformity to Benford’s Law, we ran a simulation that involved changing the value of a single line item in a firm’s income statement and calculated how that change affected the financial statements overall. We then re-measured the FSD Score based on the manipulation and the changes the manipulation induced in the financial statements.

We chose to manipulate sales since it is an item that managers may be tempted to change to mask poor performance and is interconnected with many other financial statement items. As a result of the sales manipulation, a firm likely needs to adjust cost of goods sold and tax expense accordingly. Our simulation randomly (from a uniform distribution) seeded a journal entry to increase sales by between 5 and 10 % to make the change material. COGS were increased by between 20 and 90 % as a percent of the increase of sales manipulation, and taxes payable were increased by between 0 and 35 % of the difference between the previous two calculations. Put more simply, we added three journal entries to the original numbers:

1. Increase accounts receivables	Increase revenue
2. Increase cost of goods sold	Decrease inventory
3. Increase tax expense	Increase tax Payable

As a result of the journal entries, we list below the line items that changed in our simulation when sales changed as described above.

Income statement
Sales
Cost of goods sold
Gross profit (Loss)
Operating income after depreciation
Operating income before depreciation
Pretax income
Pretax income–domestic
Income taxes—federal
Income taxes—total
Income before extraordinary items
Income before extraordinary items—adjusted for common stock equivalents
Income before extraordinary items—available for common
Income before extraordinary items and noncontrolling interests
Net income adjusted for common/ordinary stock (Capital) equivalents
Balance sheet
Receivables—Trade
Receivables—Total
Inventories—finished goods
Inventories—total
Current assets—total
Assets—total
Income taxes payable
Current liabilities—total
Liabilities—total
Retained earnings
Stockholders equity—total
Liabilities and stockholders equity—total
Statement of cash flow
Income before extraordinary items (cash flow)
Accounts receivable—decrease(increase)
Inventory—decrease (increase)
Income taxes—accrued—increase/(decrease)

In our simulation, we chose to manipulate a firm with a set of financial numbers that generally, but not perfectly, conforms to Benford’s Law. We therefore chose Alcoa’s 2011 financial results since the results not only conform to Benford’s Law but also contain a large number of line items, ensuring that a single number does not have an undue impact on our measurements. In running the simulation 1000 times, Alcoa’s FSD Score increases 950 times (95 %). We interpret the findings from our simulation to imply that divergence from Benford’s Law could signal that a firm is intentionally manipulating its financial numbers.

Appendix 6: Variable definitions

Variable	Description	Definition
FSD_Score based on the MAD statistic	Mean absolute deviation statistic for annual financial statement data	The sum of the absolute difference between the empirical distribution of leading digits in annual financial statements and their theoretical Benford distribution, divided by the number of leading digits. See Appendix 1 for a sample calculation
FSD_Score based on the KS statistic	Kolmogorov–Smirnov statistic for annual financial statement data	The maximum deviation of the cumulative differences between the empirical distribution of leading digits in annual financial statements and their theoretical Benford distribution. See Appendix 1 for a sample calculation
AAER	Indicator equal to 1 for the year in which a firm was first identified by the SEC as having materially misstated its financial statements	Firms that were included in the annual SEC Accounting and Auditing Enforcement Releases (AAER) database (Dechow et al. 2011)
ABS_JONES_RESID	Absolute value of the residual from the modified Jones model, following Kothari et al. (2005)	The following regression is estimated for each industry year: tca = ∆sales + net PPE + ROA, where tca = (∆current assets − ∆cash − ∆current liabilities + ∆ debt in current liabilities − depreciation and amortization), ROA is defined as below, and all variables are scaled by beginning-of-period total assets
STD_DD_RESID	Five-year moving standard deviation of the Dechow-Dichev residual, following Francis et al. (2005)	The following regression is estimated for each industry year: tca = cfo_t−1 + cfo + cfo_t+1, where tca is defined as above, and cfo = (income before extraordinary items − (wc_acc—depreciation and amortization)). All variables are scaled by average total assets. The five-year rolling standard deviations of the residuals are then calculated
MANIPULATOR	Indicator variable equal to 1 if the M-Score is greater than −1.78	M-Score is calculated following Beneish (1999)
F_SCORE	The scaled probability of earnings management or a misstatement for a firm-year based on firm financial characteristics	Calculated using the coefficients in Table 7 Model 2 of Dechow et al. (2011)
ABS_WCACC	The absolute value of working capital accruals	Calculated as (∆current assets − ∆cash − ∆current liabilities + ∆debt in current liabilities + ∆taxes paid) scaled by average total assets
ABS_RSST	The absolute value of working capital accruals as defined by Richardson et al. (2005)	Calculated as (∆WC + ∆NCO + ∆FIN) scaled by average total assets. WC = (current assets − cash and short-term investments) − (current liabilities − debt in current liabilities). NCO = (total assets − current assets − investments and advances) − (total liabilities − current liabilities − long-term debt). FIN = (short-term investments + long-term investments) − (long-term debt + debt in current liabilities + preferred stock)
LOSS	Indicator if firm-year had negative net income	Equal to 1 if net income < 0, 0 otherwise
CH_CS	Change in cash sales	Cash sales_t − cash sales_t−1/cash sales_t−1, where cash sales = total revenue − ∆total receivables
ROA	Return on assets	Income before extraordinary items_t/total assets_t−1
CH_ROA	Change in ROA	ROA_t − ROA_t−1
SOFT_ASSETS	Soft assets	(Total assets—net PPE—cash)/total assets_t−1
ISSUE	Indicator variable that equals 1 if the company issued debt or equity in that year	When long-term debt issuance (Compustat DLTIS) > 0 or sale of common or preferred stock (SSTK) > 0, then ISSUE = 1
MKT_VAL	Market value of equity	Closing price at the end of the fiscal year * common shares outstanding
MTB	Market-to-book ratio	MKT_VAL/book value of total stockholders’ equity.
NI_VOL	Earnings volatility	Standard deviation of net income for the last five years.
RET_VOL	Return volatility	Standard deviation of monthly stock returns in the last year
PE	Price-to-earnings ratio	Closing stock price at the end of the fiscal year/earnings per share
AT	Total assets	Compustat AT
EARNINGS PERSISTENCE		Correlation between net income and net income in the following year
SALES_GROWTH	Year-over-year percentage change in revenue	(Revenue_t − Revenue_t−1)/Revenue_t−1
DIV	Dividend indicator	Equal to 1 if a firm issued dividends, 0 otherwise.
SIZE	Log of market value of equity	Log(common shares outstanding * price at the end of the fiscal year)
SI	Special items	Total special items/total assets
AGE	Age of the firm	Number of years the firm appears in the CRSP monthly stock return file
RESTATED_NUMS	Indicator variable that equals 1 if reported numbers are restated	For all firms from 2001 to 2011 where both restated and original financial numbers are available in Compustat (datafmt = STD for original and datafmt = SUMM_STD for restated) and at least 10 variables have changed, we separate the original from the restated financial numbers and create an indicator equaling 1 for restated numbers
INDUSTRY	Industry classification	Groups companies into 17 industry portfolios based on the Fama–French industry classification

Appendix 7

See Figs. 2, 3, 4 and Tables 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Amiram, D., Bozanic, Z. & Rouen, E. Financial statement errors: evidence from the distributional properties of financial statement numbers. Rev Account Stud 20, 1540–1593 (2015). https://doi.org/10.1007/s11142-015-9333-z

Download citation

Published: 02 August 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11142-015-9333-z

Keywords

JEL Classification

M41

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Financial statement errors: evidence from the distributional properties of financial statement numbers

Abstract

Access this article

Similar content being viewed by others

Trivialization of the bottom line and losing relevance of losses

Earnings Management, Auditor Changes and Ethics: Evidence from Companies Missing Earnings Expectations

NPO Financial Statement Quality: An Empirical Analysis Based on Benford’s Law

Notes

References

Acknowledgments