Skip to main content
Log in

An Information Theoretic Approach to Model Selection: A Tutorial with Monte Carlo Confirmation

  • Original Research
  • Published:
Perspectives on Behavior Science Aims and scope Submit manuscript

Abstract

A reliance on null hypothesis significance testing (NHST) and misinterpretations of its results are thought to contribute to the replication crisis while impeding the development of a cumulative science. One solution is a data-analytic approach called Information-Theoretic (I-T) Model Selection, which builds upon Maximum Likelihood estimates. In the I-T approach, the scientist examines a set of candidate models and determines for each one the probability that it is the closer to the truth than all others in the set. Although the theoretical development is subtle, the implementation of I-T analysis is straightforward. Models are sorted according to the probability that they are the best in light of the data collected. It encourages the examination of multiple models, something investigators desire and that NHST discourages. This article is structured to address two objectives. The first is to illustrate the application of I-T data analysis to data from a virtual experiment. A noisy delay-discounting data set is generated and seven quantitative models are examined. In the illustration, it is demonstrated that it is not necessary to know the “truth” is to identify the one that is closest to it and that the most likely models conform to the model that generated the data. Second, we examine claims made by advocates of the I-T approach using Monte Carlo simulations in which 10,000 different data sets are generated and analyzed. The simulations showed that 1) the probabilities associated with each model returned by the single virtual experiment approximated those that resulted from the simulations, 2) models that were deemed close to the truth produced the most precise parameter estimates, and 3) adding a single replicate sharpens the ability to identify the most probable model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

Download references

Acknowledgements

Special thanks to Kelly Banna, Alejandro Lazarte, and Dalisa Kendricks for helpful comments on earlier versions.

Supported by ES 024845 from NIH.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Christopher Newland.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1

Table 5 Propagation of Calculations for determining the AICc Weight/ Model Probability

The calculation of the Model Probabilities (AICc Weights) can be calculated using a simple spreadsheet with formulas as shown below. One can begin with either the residual sum of squares (RSS), the log likelihood (LogLik), the variance (\( {\sigma}^2=\frac{RSS}{N}\Big) \), the AIC or the AICc, depending on what is provided by the software. The term that is used determines how far to the right in the table one begins the calculation. Note that K is 1 + the degrees of freedom because the estimate of the AICc includes an additional parameter for the variance. If using AICc or AIC from a software package, check that it uses the correct value for K. If estimating the AICc from the RSS or variance directly then the estimate in the table’s footnote can be used.

The following steps can be used when conducting an I-T analysis:

  1. 1.

    Collect data.

  2. 2.

    Analyze the data using an analysis appropriate to the data set such as linear or nonlinear regression, mixed-effects modeling, or analyses of variance.

  3. 3.

    Extract from that analysis one of the following parameters: residual sum of squares (RSS), likelihood, log likelihood, AIC, or AICc.

    1. a.

      If the RSS is used then use Table A1 to estimate the AICc.

    2. b.

      If the likelihood is used then take the log-likelihood. Any base can be used as long as it is consistent, but typically the base is the natural logarithm, e. Then proceed from the fourth column of Table A1

    3. c.

      If the AIC is extracted then add the correction factor 2K(K+1)/N-K-1 and proceed from the fifth column of Table A1.

  4. 4.

    Verify that the appropriate K is used. This will usually be the number of regression parameters estimated plus 1 for the variance and 1 for an intercept (if that is not already included).

  5. 5.

    Construct a table similar to Table A1.

  6. 6.

    Locate the smallest AICc and it in Row 1 of the table (for convenience).

  7. 7.

    From the AICc column, construct the Delta AICc column by subtracting each AICc from the smallest one.

  8. 8.

    Calculate the evidence ratios by taking exp(-0.5 (AICc – AICc MIN)). This is the probability that the model is at least as good as the one with the smallest AICc. Note that the evidence ratio of the best model will be 1.0 because the probability that it is at least as good is 1.0.

  9. 9.

    Calculate the AICc Weights by dividing the evidence ratio (Column 7) by the sum of all the evidence ratios.

  10. 10.

    Test to be sure that the sum of the AICc Weights equals 1.0. Each AICc weight is the probability that each model is the best of the candidate set.

Appendix 2

Detailed Example of AICc Performance from Simulations

The data in Table 4 are single-point estimates of the percentage of times that each model was ranked as best but this ordinal measure is inadequate in providing a full representation of the I-T approach because an AICc can be the best by a small or a large amount. The histograms in Fig. 6 illustrate the distribution of actual model probabilities (AICc weights) from these 10,000 data sets and how adding a second replicate dramatically changes the ability of the AICc to converge on good models (Row 2). With only one replicate per fit (top row), the AICc Weights for the hyperbolic + intercept model was heavily left-skewed, with most weights greater than 0.75, meaning that this model was selected as having at least 75% chance of being the best model on most runs. The hyperboloid + intercept and the hyperbolic models overlapped considerably, consistent with their switching places from Table 4. The hyperboloid and exponential models fared more poorly. The exponential model rarely had an AICc greater than 0.25.

Row 2 shows that merely adding a second estimate to a single discounting function sharpened the discriminability of the AICc weight. The hyperbolic + intercept model was showed a sharper right peak but the three poorest models’ left peak was also much sharper, indicating that their AICc weights were usually less than 0.05.

The proportions (third and sixth columns) and model probabilities (fourth and seventh columns) in Table 4 lie are consistent with the histograms in Fig. 6. The hyperbolic + intercept model was most frequently selected as the most probable one across the 10,000 different data sets but the outliers indicate that on rare occasions a data set was generated for which one of the other models, exclusive of the linear one, was deemed probable. Thus, a single experiment stands a good chance of predicting model probabilities and a replication that produces similar values will provide even greater confidence in the conclusions. The linear (and mean, not shown) models were never selected as a good model. These analyses demonstrate the strength of the AICc approach because in most cases, the highest ranked model is the model that was actually used to generate the data sets.

Fig. 6
figure 6

Distribution of AICc-Weight (model probabilities) for five model types. Fits were constructed using one (top row) or two (bottom) replicates per delay interval. Each horizontal axis can be interpreted as the probability that a model is the best of the seven models examined in Table 3. Note that the hyperboloid + intercept model was usually the most probable model of the data set but that for some data sets other models were ranked as the most probable.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Newland, M.C. An Information Theoretic Approach to Model Selection: A Tutorial with Monte Carlo Confirmation. Perspect Behav Sci 42, 583–616 (2019). https://doi.org/10.1007/s40614-019-00206-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40614-019-00206-1

Keywords

Navigation