Abstract
In multiple regression under the normal linear model, the presence of multicollinearity is well known to lead to unreliable and unstable maximum likelihood estimates. This can be particularly troublesome for the problem of variable selection where it becomes more difficult to distinguish between subset models. Here we show how adding a spike-and-slab prior mitigates this difficulty by filtering the likelihood surface into a posterior distribution that allocates the relevant likelihood information to each of the subset model modes. For identification of promising high posterior models in this setting, we consider three EM algorithms, the fast closed form EMVS version of Rockova and George (J Am Stat Assoc, 2014) and two new versions designed for variants of the spike-and-slab formulation. For a multimodal posterior under multicollinearity, we compare the regions of convergence of these three algorithms. Deterministic annealing versions of the EMVS algorithm are seen to substantially mitigate this multimodality. A single simple running example is used for illustration throughout.
Similar content being viewed by others
References
Bar, H., Booth, J., Wells, M.: An empirical Bayes approach to variable selection and QTL analysis. Proceedings of the 25th International Workshop on Statistical Modelling, pp. 63–68. Glasgow, Scotland (2010)
Figueiredo, M.A.: Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1150–1159 (2003)
George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)
George, E.I., McCulloch, R.E.: Approaches for Bayesian variable selection. Stat. Sin. 7, 339–373 (1997)
George, E., Rockova, V., Lesaffre, E.: Faster spike-and-slab variable selection with dual coordinate ascent EM. In: Proceedings of the 28th Workshop on Statistical Modelling, vol. 1, pp. 165–170 (2013)
Griffin, J., Brown, P.: Alternative prior distributions for variable selection with very many more variables than observations. In: Technical report, University of Warwick, University of Kent (2005)
Griffin, J.E., Brown, P.J.: Bayesian hyper-LASSOS with non-convex penalization. Aust. N. Z. J. Stat. 53, 423–442 (2012)
Hayashi, T., Iwata, H.: EM algorithm for Bayesian estimation of genomic breeding values. BMC Genetics 11, 1–9 (2010)
Kiiveri, H.: A Bayesian approach to variable selection when the number of variables is very large. Institute of Mathematical Statistics Lecture Notes—Monograph Series 40, 127–143 (2003)
Rockova, V., George, E.: EMVS: the EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 361 (2014, forthcoming)
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)
Ueda, N., Nakano, R.: Deterministic annealing EM algorithm. Neural Netw. 11, 271–282 (1998)
Zellner, A.: On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Goel, P.K., Zellner, A. (eds.) Bayesian inference and decision techniques, pp. 233–243. Elsevier, North-Holland, Amsterdam
Author information
Authors and Affiliations
Corresponding author
Additional information
The authors would like to thank the reviewers for very helpful suggestions. This work was supported by AHRQ Grant R21-HS021854.
Rights and permissions
About this article
Cite this article
Ročková, V., George, E.I. Negotiating multicollinearity with spike-and-slab priors. METRON 72, 217–229 (2014). https://doi.org/10.1007/s40300-014-0047-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40300-014-0047-y