Skip to main content
Log in

Integral priors for Bayesian model selection: how they operate from simple to complex cases

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

In Bayesian model selection for the sake of objectivity very often default estimation priors are used. However, these priors are usually improper yielding indeterminate Bayes factors that preclude the comparison of the models. To solve this difficulty integral priors have been proposed as prior distributions for Bayesian model selection in Cano et al. (Test 17(3):493–504, 2008). These priors are the solution to a system of two integral equations, and the \(\sigma \)-finite invariant measures associated with a Markov chain. They have been further developed in Cano and Salmerón (Bayesian Anal 8(2):361–380, 2013) and applied to binomial regression models in Salmerón et al. (Stat Sin 25(3):1009–1023, 2015). One of the main advantages of this methodology is that it can be applied to compare both nested and non-nested models. Here, we present some applications of this methodology along with some new technical developments, from the simplest case to more advanced ones to illustrate how it works. We begin with the toy example of a normal mean with known variance to easily point out how this methodology operates. Then, we consider the comparison of the normal location model with the double exponential one. Finally, we consider the case of integral priors for the one-way heteroscedastic ANOVA, where the simulation of the Markov chains involves a Gibbs sampling algorithm, and we present some relevant conclusions and outline oncoming research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Bayarri MJ, Berger JO, Forte A, García-Donato G (2012) Criteria for Bayesian model choice with application to variable selection. Ann Stat 40:1550–1577

    Article  MathSciNet  Google Scholar 

  • Berger JO, Pericchi LR (1996) The intrinsic Bayes factor for model selection and prediction. J Am Stat Assoc 91(433):109–122

    Article  MathSciNet  Google Scholar 

  • Berger JO, Pericchi LR, Varshavsky J (1998) Bayes factors and marginal distributions in invariant situations. Sankhya A 60:307–321

    MathSciNet  MATH  Google Scholar 

  • Berger JO, Sellke T (1987) Testing a point null hypothesis: the irreconcilability of P values and evidence. J Am Stat Assoc 82(397):112–122

    MathSciNet  MATH  Google Scholar 

  • Cano JA, Kessler M, Moreno E (2004) On intrinsic priors for nonnested models. Test 13(2):445–463

    Article  MathSciNet  Google Scholar 

  • Cano JA, Kessler M, Salmerón D (2007a) Integral priors for the one way random effects model. Bayesian Anal 2(1):59–68

    Article  MathSciNet  Google Scholar 

  • Cano JA, Kessler M, Salmerón D (2007b) A synopsis of integral priors for the one way random effects model. In: Bernardo JM et al (eds) Bayesian statistics 8. Oxford University Press, Oxford, pp 577–582

    Google Scholar 

  • Cano JA, Salmerón D (2013) Integral priors and constrained imaginary training samples for nested and non-nested Bayesian model comparison. Bayesian Anal 8(2):361–380

    Article  MathSciNet  Google Scholar 

  • Cano JA, Salmerón D (2016) A review of the developments on integral priors for Bayesian model selection. Beio 32(2):96–111

    Google Scholar 

  • Cano JA, Salmerón D, Robert CP (2008) Integral equation solutions as prior distributions for Bayesian model selection. Test 17(3):493–504

    Article  MathSciNet  Google Scholar 

  • Diebolt J, Robert CP (1994) Estimation of finite mixture distributions by Bayesian sampling. J R Stat Soc Ser B 56:363–375

    MATH  Google Scholar 

  • Eaton ML (1992) A statistical dyptich: admissible inferences-recurrence of symmetric Markov chains. Ann Stat 20:1147–1179

    Article  Google Scholar 

  • Hobert JP, Robert CP (1999) Eaton’s Markov chain, its conjugate partner and P-admissibility. Ann Stat 27:361–373

    Article  MathSciNet  Google Scholar 

  • León-Novelo L, Moreno E, Casella G (2012) Objective Bayes model selection in probit models. Stat Med 31(4):353–365

    Article  MathSciNet  Google Scholar 

  • Lindley DV (1957) A statistical paradox. Biometrika 44:187–192

    Article  Google Scholar 

  • Liu JS, Wong WH, Kong A (1994) Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and sampling schemes. Biometrika 81:27–40

    Article  MathSciNet  Google Scholar 

  • Meyn SP, Tweedie RL (1993) Markov chains and stochastic stability. Springer, New York

    Book  Google Scholar 

  • Moreno E, Bertolino F, Racugno W (1998) An intrinsic limiting procedure for model selection and hypotheses testing. J Am Statist Assoc 93:1451–1460

    Article  MathSciNet  Google Scholar 

  • Moreno E, Girón FJ, Casella G (2010) Consistency of objective Bayes factors as the model dimension grows. Ann Stat 38:1937–1952

    Article  MathSciNet  Google Scholar 

  • Pérez JM, Berger JO (2002) Expected posterior priors for model selection. Biometrika 89(3):491–512

    Article  MathSciNet  Google Scholar 

  • Salmerón D, Cano JA, Robert CP (2015) Objective Bayesian hypothesis testing in binomial regression models with integral prior distributions. Stat Sin 25(3):1009–1023

    MathSciNet  MATH  Google Scholar 

  • Womack AJ, León-Novelo L, Casella G (2014) Inference from intrinsic Bayes procedures under model selection and uncertainty. J Am Stat Assoc 109(507):1040–1053

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was supported by the Séneca Foundation Programme for the Generation of Excellence Scientific Knowledge under Project 15220/PI/10.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. A. Cano.

Appendices

Appendix 1: Integral and intrinsic priors for testing a point null hypothesis in a normal model with unknown mean and variance

1.1 Intrinsic priors for nested models with the default estimation prior for model \(M_1\) as its intrinsic prior

In this case there are two ways to obtain the intrinsic prior \(\pi _2(\theta _2)\) for model \(M_2\), see Moreno et al. (1998). The first one is to obtain it from the posterior distribution as

$$\begin{aligned} \pi _2(\theta _2)= & {} \int \pi _2^N(\theta _2\mid x)m_1^N(x)\mathrm{d}x=\int \frac{f_2(x\mid \theta _2)\pi _2^N(\theta _2)}{m_2^N(x)}m_1^N(x)\mathrm{d}x \\= & {} \pi _2^N(\theta _2)\int \frac{f_2(x\mid \theta _2)m_1^N(x)}{m_2^N(x)}\mathrm{d}x=\pi _2^N(\theta _2)E_{f_2(x\mid \theta _2)}\left( \frac{m_1^N(x)}{m_2^N(x)}\right) , \end{aligned}$$

and the second one from the Fubini’s theorem as

$$\begin{aligned} \pi _2(\theta _2)= & {} \int \pi _2^N(\theta _2\mid x)m_1^N(x)\mathrm{d}x=\int \pi _2^N(\theta _2\mid x)f_1(x\mid \theta _1)\pi _1^N(\theta _1)\mathrm{d}\theta _1\mathrm{d}x \\= & {} \int \left( \int \pi _2^N(\theta _2\mid x)f_1(x\mid \theta _1)\mathrm{d}x\right) \pi _1^N(\theta _1)d\theta _1=\int \pi (\theta _2\mid \theta _1)\pi _1^N(\theta _1)d\theta _1. \end{aligned}$$

This second way is more comfortable to work with and it is the commonly used in the literature on intrinsic priors. Now, expressing \(\pi _2(\theta _2)\) as

$$\begin{aligned} \pi _2^N(\theta _2)E_{f_2(x\mid \theta _2)}\left( \frac{m_1^N(x)}{m_2^N(x)}\right) =\int \pi _2^N(\theta _2\mid x)m_1^N(x)\mathrm{d}x, \end{aligned}$$

it is clear that integral priors generalize intrinsic priors.

In the case where the simpler model is a point null hypothesis, \(H_0:\theta _1=\theta _{10}\), the two ways yield \(\pi _2(\theta _2)\) as

$$\begin{aligned} \pi _2^N(\theta _2)E_{f_2(x\mid \theta _2)}\left( \frac{f_1(x\mid \theta _{10})}{m_2^N(x)}\right) , \end{aligned}$$

and

$$\begin{aligned} \int \pi _2^N(\theta _2\mid x)f_1(x\mid \theta _{10})\mathrm{d}x, \end{aligned}$$

respectively, that of course, are the same.

1.2 Intrinsic and integral priors for the comparison of \(M_1:N(0,1)\) versus \(M_2:N(\theta ,\sigma ^2)\)

To compare \(M_1:N(0,1)\) versus \(M_2:N(\theta ,\sigma ^2)\), with default estimation prior \(\pi _2^N(\theta ,\sigma )\propto 1/\sigma \), using the first expression and taking into account that \({m_2^N(x)}\) is equal to \(\frac{1}{2\vert x_1-x_2\vert }\), see Berger and Pericchi (1996), the intrinsic prior is

$$\begin{aligned} \frac{1}{\sigma }\int 2\vert x_1-x_2\vert N(x_1\mid \theta ,\sigma ^2)N(x_2\mid \theta ,\sigma ^2)N(x_1\mid 0,1)N(x_2\mid 0,1)\mathrm{d}x_1\mathrm{d}x_2, \end{aligned}$$

that using the change of variables \(u=x_1-x_2\) y \(v=x_1+x_2\) yields

$$\begin{aligned}&\frac{1}{\sigma }\int \frac{1}{2}2\vert u\vert N((u+v)/2\mid \theta ,\sigma ^2) \\&\quad \times N((v-u)/2\mid \theta ,\sigma ^2)N((u+v)/2\mid 0,1)N((v-u)/2\mid 0,1)\mathrm{d}u\mathrm{d}v \end{aligned}$$
$$\begin{aligned}= & {} \frac{1}{\sigma }\int \vert u\vert \frac{\exp \left( -\frac{\theta ^2}{\sigma ^2+1}-\frac{1}{4} \left( \frac{1}{\sigma ^2}+1\right) u^2\right) }{2 \pi ^{3/2} \sigma ^2\sqrt{\frac{1}{\sigma ^2}+1} }du=\frac{1}{\sigma }\frac{2 e^{-\frac{\theta ^2}{\sigma ^2+1}}}{\pi ^{3/2} \sqrt{\frac{1}{\sigma ^2}+1} \left( \sigma ^2+1\right) } \\= & {} N(\theta \mid 0,(\sigma ^2+1)/2)\frac{2}{\pi (1+\sigma ^2)}. \end{aligned}$$

Of course, as we are dealing with a point null hypothesis integral priors and intrinsic priors coincide and they are unique. Nevertheless, as it is shown next, we can also obtain the integral priors explicitly using the steps of the associated Markov chain, which again highlight the idea that integral and intrinsic priors go in parallel ways while dealing with nested situations but integral priors can go further.

1.3 Obtention of the integral prior for \(M_2:N(\theta ,\sigma ^2)\) using the associated Markov chain

The minimal training sample is \(x=(x_1,x_2)\), and the posterior distribution with the default estimation prior \(\pi ^N(\theta ,\sigma )\propto 1/\sigma \) is therefore

$$\begin{aligned} \pi _2^N(\theta ,\sigma \mid x)\propto \sigma ^{-3}\exp \left( -\frac{1}{2\sigma ^2} \left( s^2+2(\theta -\bar{x})^2\right) \right) , \end{aligned}$$

with \(s^2=(x_1-x_2)^2/2\) and \(\bar{x}=(x_1+x_2)/2\). Then, \(\pi _2^N(\theta \mid \sigma ,x)=N(\theta \mid \bar{x},\sigma ^2/2)\) and

$$\begin{aligned} \pi _2^N(\sigma \mid x)\propto \sigma ^{-2}\exp \left( -\frac{s^2}{2\sigma ^2}\right) . \end{aligned}$$

Now, the associated Markov chain transition consists of just two steps as in the toy example since we are dealing with a point null hypothesis again. First, it is simulated x from model \(M_1\) and secondly it is simulated from \(\pi _2^N(\theta ,\sigma \mid x)\). Then, \( \theta =\bar{x}+\sigma \varepsilon _1/\sqrt{2}, \) where \(\varepsilon _1\sim N(0,1)\) and \(\bar{x}\sim N(0,1/2)\), and therefore \(\pi (\theta \mid \sigma )=N(\theta \mid 0,(\sigma ^2+1)/2)\) and \( \pi (\sigma )= \int \pi _2^N(\sigma \mid x)p(s^2)ds^2, \) where \(p(s^2)\) is \(\chi ^2_1\) density. Then, normalizing \(\pi _2^N(\sigma \mid x)\) we obtain \( \pi (\sigma )=2/\pi (\sigma ^2+1), \) and finally we obtain again \(\pi (\theta ,\sigma )= 2N(\theta \mid 0,(\sigma ^2+1)/2)/\pi (\sigma ^2+1)\).

Appendix 2. Computation of the marginal \(m_{2}(\mathbf {x})\) in the comparison of the normal model versus the double exponential one

The ordered statistics of \(\mathbf {x},\) \((x_{(1)},\ldots ,x_{(n)}),\) are needed to compute the marginal \(m_{2}(\mathbf {x})\). Denoting \(R_{1}=(-\infty ,x_{(1)})\), \(R_{j}=(x_{(j-1)}, x_{(j)})\) for \(j=2,\ldots ,n\) and \(R_{n+1}=(x_{(n)},\infty )\), we have that

$$\begin{aligned} m_{2}(\mathbf {x})=\sum _{j=1}^{n+1}\int _{R_{j}}2^{-n} \exp \left( -\sum _{i=1}^{n}|x_{i}-\lambda |\right) \mathrm{d}\lambda . \end{aligned}$$

Let \(H_{j}(\mathbf x ,\lambda )\), \(j=1,\ldots ,n+1,\) be the value of the function \(-\sum _{i=1}^{n}|x_{i}-\lambda |\) in the region \(R_{j}\), that is, \(H_{1}(\mathbf x ,\lambda )=n\lambda -\sum _{i=1}^{n}x_{(i)},\) \(H_{n+1}(\mathbf x ,\lambda )=\sum _{i=1}^{n}x_{(i)}-n\lambda ,\) and \(H_{j}(\mathbf x ,\lambda )=\sum _{i=1}^{j-1}x_{(i)}+(n-2(j-1))\lambda -\sum _{i=j}^{n}x_{(i)}\) for \(j=2,\ldots ,n\). Then,

$$\begin{aligned} m_{2}(\mathbf x )=2^{-n}\sum _{j=1}^{n+1}\int _{R_{j}}\exp (H_{j}(\mathbf x ,\lambda ))\mathrm{d}\lambda =2^{-n}\sum _{j=1}^{n+1}I_{j}. \end{aligned}$$

Straightforward computations yield \(I_{1}=\frac{1}{n}\exp \left( (n-1)x_{(1)}-\sum _{i=2}^{n}x_{(i)}\right) .\) For the cases \(j=2,\ldots ,n\) and \(j\ne n/2+1\) we have:

$$\begin{aligned}&I_{j}=\frac{1}{n-2(j-1)}\exp \left( \sum _{i=1}^{j-1}x_{(i)}-\sum _{i=j}^{n}x_{(i)}\right) \\&\quad \times \left[ \exp \left( (n-2(j-1))x_{(j)}\right) -\exp \left( (n-2(j-1))x_{(j-1)}\right) \right] . \end{aligned}$$

For even n we obtain \(I_{j}\) for \(j=n/2+1\) as:

$$\begin{aligned} I_{j}=\exp \left( \sum _{i=1}^{j-1}x_{(i)}-\sum _{i=j}^{n}x_{(i)}\right) \left( x_{(j)}-x_{(j-1)}\right) , \end{aligned}$$

therefore for the sake of simplicity and without loss of generality we have restricted ourselves to the case of odd n. Finally,

$$\begin{aligned} I_{n+1}=\frac{1}{n}\exp \left( \sum _{i=1}^{n-1}x_{(i)}-(n-1)x_{(n)}\right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cano, J.A., Iniesta, M. & Salmerón, D. Integral priors for Bayesian model selection: how they operate from simple to complex cases. TEST 27, 968–987 (2018). https://doi.org/10.1007/s11749-018-0579-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-018-0579-1

Keywords

Mathematics Subject Classification

Navigation