Abstract
Approximate Bayesian inference on the basis of summary statistics is well-suited to complex problems for which the likelihood is either mathematically or computationally intractable. However the methods that use rejection suffer from the curse of dimensionality when the number of summary statistics is increased. Here we propose a machine-learning approach to the estimation of the posterior density by introducing two innovations. The new method fits a nonlinear conditional heteroscedastic regression of the parameter on the summary statistics, and then adaptively improves estimation using importance sampling. The new algorithm is compared to the state-of-the-art approximate Bayesian methods, and achieves considerable reduction of the computational burden in two examples of inference in statistical genetics and in a queueing model.
Similar content being viewed by others
References
Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002)
Beaumont, M.A.: Joint determination of topology, divergence time, and immigration in population trees. In: Matsumura, S., Forster, P., Renfrew, C. (eds.) Simulation, Genetics and Human Prehistory. McDonald Institute Monographs: Cambridge McDonald Institute for Archeological Research, UK, pp. 134–154 (2008)
Beaumont, M.A., Cornuet, J.-M., Marin, J.-M., Robert, C.P.: Adaptivity for ABC algorithms: the ABC-PMC scheme (2009). arXiv:0805.2256
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
Blum, M.G.B., Tran, V.C.: Approximate Bayesian Computation for epidemiological models: Application to the Cuban HIV-AIDS epidemic with contact-tracing and unobserved infectious population (2008). arXiv:0810.0896
Box, G.E.P., Cox, D.R.: An analysis of transformations. J. R. Stat. Soc. B 26, 211–246 (1964)
Bortot, P., Coles, S.G., Sisson, S.A.: Inference for stereological extremes. J. Am. Stat. Assoc. 102, 84–92 (2007)
Butler, A., Glasbey, C.A.: A latent Gaussian model for compositional data with structural zeroes. J. R. Stat. Soc. Ser. C (Appl. Stat.) 57, 505–520 (2008)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Diggle, P.J., Gratton, R.J.: Monte Carlo methods of inference for implicit statistical models. J. R. Stat. Soc. B 46, 193–227 (1984)
Fagundes, N.J.R., Ray, N., Beaumont, M., Neuenschwander, S., Salzano, S.M., Bonatto, S.L., Excoffier, L.: Statistical evaluation of alternative models of human evolution. Proc. Natl. Acad. Sci. USA 104, 17614–17619 (2007)
Fan, J., Yao, Q.: Efficient estimation of conditional variance functions in stochastic regression. Biometrika 85, 645–660 (1998)
Friedman, J.H., Stuetze, W.: Projection pursuit regression. J. Am. Stat. Assoc. 76, 817–823 (1981)
Fu, Y.-X., Li, W.-H.: Maximum likelihood estimation of population parameters. Genetics 134, 1261–1270 (1993)
Fu, Y.-X., Li, W.-H.: Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14, 195–199 (1997)
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn. Chapman & Hall, London (2003)
Grelaud, A., Robert, C.P., Marin, J.-M., Rodolphe, F., Taly, J.-F.: ABC methods for model choice in Gibbs random fields (2009). arXiv:0807.2767
Gourieroux, C., Monfort, A., Renault, E.: Indirect inference. J. Appl. Econ. 8, 85–118 (1993)
Härdle, W., Müller, M., Sperlich, S., Werwatz, A.: Nonparametric and Semiparametric Models. Springer, New York (2004)
Heggland, K., Frigessi, A.: Estimating functions in indirect inference. J. R. Stat. Soc. B 66, 447–462 (2004)
Hey, J., Nielsen, R.: Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc. Natl. Acad. Sci. USA 104, 2785–2790 (2007)
King, J.P., Kimmel, M., Chakraborty, R.: A power analysis of microsatellite-based statistics for inferring past population growth. Mol. Biol. Evol. 17, 1859–1868 (2000)
Kuhner, M.K.: LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics 22, 768–770 (2006)
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2001)
Marjoram, P., Tavaré, S.: Modern computational approaches for analysing molecular genetic variation data. Nat. Rev. Genet. 7, 759–770 (2006)
Marjoram, P., Molitor, J., Plagnol, V., Tavaré, S.: Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100, 15324–15328 (2003)
Nadaraya, E.A.: On estimating regression. Theory Probab. Appl. 9, 141–142 (1964)
Nix, D.A., Weigend, A.S.: Learning local error bars for nonlinear regression. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems 7 (NIPS’94), pp. 489–496. MIT Press, Cambridge (1995)
Ohta, T., Kimura, M.: A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22, 201–204 (1973)
Pritchard, J.K., Feldman, M.W.: Statistics for microsatellite variation based on coalescence. Theor. Popul. Biol. 50, 325–344 (1996)
Pritchard, J.K., Seielstad, M.T., Perez-Lezaun, A., Feldman, M.W.: Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999)
R Development Core Team: R: A Language and Environment for Statistical. R Foundation for Statistical Computing, Vienna, Austria (2008)
Ratmann, O., Jørgensen, O., Hinkley, T., Stumpf, M., Richardson, S., Wiuf, C.: Using likelihood-free inference to compare evolutionary dynamics of the protein networks of H. pylori and P. falciparum. PLoS Comput. Biol. 3, e230 (2007)
Reich, D.E., Goldstein, D.B.: Genetic evidence for a Paleolithic human population expansion in Africa. Proc. Natl. Acad. Sci. USA 95, 8119–8123 (1998)
Ripley, B.D.: Pattern Recognition and Neural Networks. Oxford University Press, London (1996)
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J.: Estimating the support of a high-dimensional distribution. Neural Comput. 13, 1443–1471 (2001)
Shriver, M.D., Jin, L., Ferrell, R.E., Deka, R.: Microsatellite data support an early population expansion in Africa. Genome Res. 7, 586–591 (1997)
Sisson, S.A., Fan, Y., Tanaka, M.M.: Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104, 1760–1765 (2007)
Stephens, M., Donnelly, P.: Inference in molecular population genetics. J. R. Stat. Soc. Ser. B 62, 605–635 (2000)
Tanaka, M., Francis, A., Luciani, F., Sisson, S.: Estimating tuberculosis transmission parameters from genotype data using approximate Bayesian computation. Genetics 173, 1511–1520 (2006)
Tavaré, S.: Ancestral inference in population genetics. In: Picard, J. (ed.) Lectures on Probability Theory and Statistics, pp. 1–188. Springer, Berlin (2004)
Tavaré, S., Balding, D.J., Griffiths, R.C., Donnelly, P.: Inferring coalescence times from DNA sequence data. Genetics 145, 505–518 (1997)
Toni, T., Welch, D., Strelkowa, N., Stumpf, M.P.H.: Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187–202 (2009)
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
Watson, G.S.: Smooth regression analysis. Shankya Ser. A 26, 359–372 (1964)
Weiss, G., von Haeseler, A.: Inference of population history using a likelihood approach. Genetics 149, 1539–1546 (1998)
Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error (2008). arXiv:0811.3355
Wilson, I.J., Weale, M.E., Balding, D.J.: Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities. J. R. Stat. Soc. Ser. A 166, 155–187 (2003)
Zhivotovsky, L.A., Bennett, L., Bowcock, A.M., Feldman, M.W.: Human population expansion and microsatellite variation. Mol. Biol. Evol. 17, 757–767 (2000)
Zhivotovsky, L.A., Rosenberg, N.A., Feldman, M.W.: Features of evolution and expansion of modern humans, inferred from genome-wide microsatellite markers. Am. J. Hum. Genet. 72, 1171–1186 (2003)
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Blum, M.G.B., François, O. Non-linear regression models for Approximate Bayesian Computation. Stat Comput 20, 63–73 (2010). https://doi.org/10.1007/s11222-009-9116-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-009-9116-0