Abstract
To deal with very large datasets a mini-batch version of the Monte Carlo Markov Chain Stochastic Approximation Expectation–Maximization algorithm for general latent variable models is proposed. For exponential models the algorithm is shown to be convergent under classical conditions as the number of iterations increases. Numerical experiments illustrate the performance of the mini-batch algorithm in various models. In particular, we highlight that mini-batch sampling results in an important speed-up of the convergence of the sequence of estimators generated by the algorithm. Moreover, insights on the effect of the mini-batch size on the limit distribution are presented. Finally, we illustrate how to use mini-batch sampling in practice to improve results when a constraint on the computing time is given.
Similar content being viewed by others
References
Allassonnière, S., Amit, Y., Trouvé, A.: Toward a coherent statistical framework for dense deformable template estimation. J. R. Stat. Soc. Ser. B 69, 3–29 (2007)
Allassonnière, S., Kuhn, E., Trouvé, A.: Construction of Bayesian deformable models via a stochastic approximation algorithm: a convergence study. Bernoulli 16(3), 641–678 (2010)
Andrieu, C., Moulines, E., Priouret, P.: Stability of stochastic approximation under verifiable conditions. SIAM J. Control. Optim. 44(1), 283–312 (2005)
Cappé, O.: Online EM algorithm for hidden Markov models. J. Comput. Graph. Stat. 20(3), 728–749 (2011)
Cappé, O., Moulines, E.: On-line expectation-maximization algorithm for latent data models. J. R. Stat. Soc. Ser. B 71(3), 593–613 (2009)
Davidian, M., Giltinan, D.M.: Nonlinear Models for Repeated Measurement Data. CRC Press, Boca Raton (1995)
Delyon, B., Lavielle, M., Moulines, E.: Convergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 27(1), 94–128 (1999)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–38 (1977)
Duchateau, L., Janssen, P.: The Frailty Model. Springer, New York (2008)
Fort, G., Moulines, E., Roberts, G.O., Rosenthal, J.S.: On the geometric ergodicity of hybrid samplers. J. Appl. Probab. 40, 123–146 (2003)
Fort, G., Jourdain, B., Kuhn, E., Lelièvre, T., Stoltz, G.: Convergence of the Wang-Landau algorithm. Math. Comput. 84(295), 2297–2327 (2015)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)
Karimi, B., Lavielle, M., Moulines, E.: On the Convergence Properties of the Mini-batch EM and MCEM Algorithms (unpublished) (2018)
Karimi, B.: Non-Convex Optimization for Latent Data Models: Algorithms, Analysis and Applications. phD Thesis https://tel.archives-ouvertes.fr/tel-02319140 (2019)
Kuhn, E., Lavielle, M.: Coupling a stochastic approximation version of EM with an MCMC procedure. ESAIM: P&S 8, 115–131 (2004)
Kuhn, E., Lavielle, M.: Maximum likelihood estimation in nonlinear mixed effects models. Comput. Stat. Data Ann. 49(4), 1020–1038 (2005)
Lange, K.: A gradient algorithm locally equivalent to the EM algorithm. J. R. Stat. Soc. Ser. B 2(57), 425–437 (1995)
Liang, P., Klein, D.: Online EM for unsupervised models. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, NAACL ’09, pp 611–619 (2009)
Matias, C., Robin, S.: Modeling heterogeneity in random graphs through latent space models: a selective review. ESAIM Proce. Surv. 47, 55–74 (2014)
Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–368. MIT Press, Cambridge (1999)
Nguyen, H., Forbes, F., McLachlan, G.: Mini-batch learning of exponential family finite mixture models. Stat, Comput (2020)
Robert, C.P., Casella, G.: Monte Carlo statistical methods, 2nd edn. Springer Texts in Statistics. Springer, New York (2004)
Titterington, D.M.: Recursive parameter estimation using incomplete data. J. R. Stat. Soc. Ser. B 2(46), 257–267 (1984)
Acknowledgements
Work partly supported by the Grant ANR-18-CE02-0010 of the French National Research Agency ANR (Project EcoNet).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kuhn, E., Matias, C. & Rebafka, T. Properties of the stochastic approximation EM algorithm with mini-batch sampling. Stat Comput 30, 1725–1739 (2020). https://doi.org/10.1007/s11222-020-09968-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-020-09968-0