Abstract
We survey some of the recent advances in mean estimation and regression function estimation. In particular, we describe sub-Gaussian mean estimators for possibly heavy-tailed data in both the univariate and multivariate settings. We focus on estimators based on median-of-means techniques, but other methods such as the trimmed-mean and Catoni’s estimators are also reviewed. We give detailed proofs for the cornerstone results. We dedicate a section to statistical learning problems—in particular, regression function estimation—in the presence of possibly heavy-tailed data.
Similar content being viewed by others
Notes
As we explain in what follows, it suffices to ensure that the comparison is correct between \(\mu \) and any point that is not too close to \(\mu \).
The case \(q=3\) is the standard Berry–Esseen theorem, while for \(2<q<3\) one may use generalized Berry–Esseen bounds, see [71].
Note that one has the freedom to select a function \(\widehat{f}\) that does not belong to \({{\mathcal {F}}}\).
References
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58:137–147, 2002.
G. Aloupis. Geometric measures of data depth. DIMACS series in discrete mathematics and theoretical computer science, 72:147–158, 2006.
M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
J.-Y. Audibert and O. Catoni. Robust linear least squares regression. The Annals of Statistics, 39:2766–2794, 2011.
Y. Baraud and L. Birgé. Rho-estimators revisited: General theory and applications. The Annals of Statistics, 46(6B):3767–3804, 2018.
Y. Baraud, L. Birgé, and M. Sart. A new method for estimation and model selection: \(\rho \)-estimation. Inventiones Mathematicae, 207(2):425–517, 2017.
P.L. Bartlett, O. Bousquet, and S. Mendelson. Localized Rademacher complexities. Annals of Statistics, 33:1497–1537, 2005.
P.J. Bickel. On some robust estimates of location. The Annals of Mathematical Statistics, 36:847–858, 1965.
A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Learnability and the Vapnik–Chervonenkis dimension. Journal of the ACM, 36:929–965, 1989.
S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities:A Nonasymptotic Theory of Independence. Oxford University Press, 2013.
C. Brownlees, E. Joly, and G. Lugosi. Empirical risk minimization for heavy-tailed losses. Annals of Statistics, 43:2507–2536, 2015.
S. Bubeck, N. Cesa-Bianchi, and G. Lugosi. Bandits with heavy tail. IEEE Transactions on Information Theory, 59:7711–7717, 2013.
P. Bühlmann and S. van de Geer. Statistics for high-dimensional data. Springer Series in Statistics. Springer, Heidelberg, 2011. Methods, theory and applications.
O. Catoni. Challenging the empirical mean and empirical variance: a deviation study. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 48(4):1148–1185, 2012.
O. Catoni and I. Giulini. Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression. arXiv preprint arXiv:1712.02747, 2017.
O. Catoni and I. Giulini. Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector. arXiv preprint arXiv:1802.04308, 2018.
Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 47–60. ACM, 2017.
Y. Cherapanamjeri, N. Flammarion, and P. Bartlett. Fast mean estimation with sub-Gaussian rates. arXiv preprint arXiv:1902.01998, 2019.
M. Chichignoud and J. Lederer. A robust, adaptive m-estimator for pointwise estimation in heteroscedastic regression. Bernoulli, 20(3):1560–1599, 2014.
M.B. Cohen, Y.T. Lee, G. Miller, J. Pachocki, and A. Sidford. Geometric median in nearly linear time. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 9–21. ACM, 2016.
L. Devroye, L. Györfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York, 1996.
L. Devroye, M. Lerasle, G. Lugosi, and R.I. Oliveira. Sub-Gaussian mean estimators. Annals of Statistics, 2016.
I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 655–664. IEEE, 2016.
I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Being robust (in high dimensions) can be practical. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), 2017.
I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Robustly learning a Gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2683–2702. Society for Industrial and Applied Mathematics, 2018.
I. Diakonikolas, D.M. Kane, and A. Stewart. Efficient robust proper learning of log-concave distributions. arXiv preprint arXiv:1606.03077, 2016.
I. Diakonikolas, W. Kong, and A. Stewart. Efficient algorithms and lower bounds for robust linear regression. arXiv preprint arXiv:1806.00040, 2018.
J. Fan, Q. Li, and Y. Wang. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(1):247–265, 2017.
L. Györfi, M. Kohler, A. Krzyżak, and H. Walk. A distribution-free theory of nonparametric regression. Springer-Verlag, New York, 2002.
F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, and W.A. Stahel. Robust statistics: the approach based on influence functions, volume 196. Wiley, 1986.
Q. Han and J.A. Wellner. A sharp multiplier inequality with applications to heavy-tailed regression problems. arXiv preprint arXiv:1706.02410, 2017.
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963.
S.B. Hopkins. Sub-Gaussian mean estimation in polynomial time. Annals of Statistics, 2019, to appear.
S.B. Hopkins and J. Li. Mixture models, robustness, and sum of squares proofs. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1021–1034. ACM, 2018.
D. Hsu. Robust statistics. http://www.inherentuncertainty.org/2010/12/robust-statistics.html, 2010.
D. Hsu and S. Sabato. Loss minimization and parameter estimation with heavy tails. Journal of Machine Learning Research, 17:1–40, 2016.
M. Huber. An optimal (\(\epsilon \), \(\delta \))-randomized approximation scheme for the mean of random variables with bounded relative variance. Random Structures & Algorithms, 2019.
P.J. Huber. Robust estimation of a location parameter. The annals of mathematical statistics, 35(1):73–101, 1964.
P.J. Huber and E.M. Ronchetti. Robust statistics. Wiley, New York, 2009. Second edition.
M. Jerrum, L. Valiant, and V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43:186–188, 1986.
E. Joly, G. Lugosi, and R. I. Oliveira. On the estimation of the mean of a random vector. Electronic Journal of Statistics, 11:440–451, 2017.
A. Klivans, P.K. Kothari, and R. Meka. Efficient algorithms for outlier-robust regression. In Proceedings of the 31st Annual Conference of Learning Theory (COLT 2018), 2018.
V. Koltchinskii. Oracle inequalities in empirical risk minimization and sparse recovery problems, volume 2033 of Lecture Notes in Mathematics. Springer, Heidelberg, 2011. Lectures from the 38th Probability Summer School held in Saint-Flour, 2008, École d’Été de Probabilités de Saint-Flour. [Saint-Flour Probability Summer School].
P.K. Kothari, J. Steinhardt, and D. Steurer. Robust moment estimation and improved clustering via sum of squares. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1035–1046. ACM, 2018.
Kevin A. Lai, Anup B. Rao, and Santosh Vempala. Agnostic estimation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 665–674. IEEE, 2016.
G. Lecué and M. Lerasle. Learning from mom’s principles: Le cam’s approach. arXiv preprint arXiv:1701.01961, 2017.
G. Lecué and M. Lerasle. Robust machine learning by median-of-means: theory and practice. Annals of Stastistics, 2019, to appear.
G. Lecué, M. Lerasle, and T. Mathieu. Robust classification via mom minimization. arXiv preprint arXiv:1808.03106, 2018.
G. Lecué and S. Mendelson. Learning subgaussian classes: Upper and minimax bounds. In S. Boucheron and N. Vayatis, editors, Topics in Learning Theory. Societe Mathematique de France, 2016.
G. Lecué and S. Mendelson. Performance of empirical risk minimization in linear aggregation. Bernoulli, 22(3):1520–1534, 2016.
M. Ledoux. The concentration of measure phenomenon. American Mathematical Society, Providence, RI, 2001.
M. Ledoux and M. Talagrand. Probability in Banach Space. Springer-Verlag, New York, 1991.
M. Lerasle and R. I. Oliveira. Robust empirical mean estimators. arXiv:1112.3914, 2012.
Po-Ling Loh and Xin Lu Tan. High-dimensional robust precision matrix estimation: Cellwise corruption under \(\epsilon \)-contamination. Electronic Journal of Statistics, 12(1):1429–1467, 2018.
G. Lugosi and S. Mendelson. Robust multivariate mean estimation: the optimality of trimmed mean. manuscript, 2019.
G. Lugosi and S. Mendelson. Sub-Gaussian estimators of the mean of a random vector. Annals of Statistics, 47:783–794, 2019.
G. Lugosi and S. Mendelson. Near-optimal mean estimators with respect to general norms. Probability Theory and Related Fields, 2019, to appear.
G. Lugosi and S. Mendelson. Regularization, sparse recovery, and median-of-means tournaments. Bernoulli, 2019, to appear.
G. Lugosi and S. Mendelson. Risk minimization by median-of-means tournaments. Journal of the European Mathematical Society, 2019, to appear.
P. Massart. Concentration inequalities and model selection. Ecole d’été de Probabilités de Saint-Flour 2003. Lecture Notes in Mathematics. Springer, 2006.
S. Mendelson. Learning without concentration. Journal of the ACM, 62:21, 2015.
S. Mendelson. An optimal unrestricted learning procedure. arXiv preprint arXiv:1707.05342, 2017.
S. Mendelson. Learning without concentration for general loss functions. Probability Theory and Related Fields, 171(1-2):459–502, 2018.
S. Mendelson and N. Zhivotovskiy. Robust covariance estimation under \({L}_4-{L}_2\) norm equivalence. arXiv preprint arXiv:1809.10462, 2018.
S. Minsker. Geometric median and robust estimation in Banach spaces. Bernoulli, 21:2308–2335, 2015.
Stanislav Minsker. Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries. The Annals of Statistics, 46(6A):2871–2903, 2018.
Stanislav Minsker. Uniform bounds for robust mean estimators. arXiv preprint arXiv:1812.03523, 2018.
Stanislav Minsker and Nate Strawn. Distributed statistical estimation and rates of convergence in normal approximation. arXiv preprint arXiv:1704.02658, 2017.
A.S. Nemirovsky and D.B. Yudin. Problem complexity and method efficiency in optimization. 1983.
Roberto I. Oliveira and Paulo Orenstein. The sub-Gaussian property of trimmed means estimators. Technical report, IMPA, 2019.
Valentin V Petrov. Limit theorems of probability theory: sequences of independent random variables. Technical report, Oxford, New York, 1995.
IG Shevtsova. On the absolute constants in the Berry–Esseen-type inequalities. In Doklady Mathematics, volume 89, pages 378–381. Springer, 2014.
C.G. Small. A survey of multidimensional medians. International Statistical Review, pages 263–277, 1990.
S.M. Stigler. The asymptotic distribution of the trimmed mean. The Annals of Statistics, 1:472–477, 1973.
B.S. Tsirelson, I.A. Ibragimov, and V.N. Sudakov. Norm of Gaussian sample function. In Proceedings of the 3rd Japan-U.S.S.R. Symposium on Probability Theory, volume 550 of Lecture Notes in Mathematics, pages 20–41. Springer-Verlag, Berlin, 1976.
A. B. Tsybakov. Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New York, 2009.
J.W. Tukey. Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, Vancouver, 1975, volume 2, pages 523–531, 1975.
J.W. Tukey and D.H. McLaughlin. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/winsorization 1. Sankhyā: The Indian Journal of Statistics, Series A, 25:331–352, 1963.
L.G. Valiant. A theory of the learnable. Communications of the ACM, 27:1134–1142, 1984.
S. van de Geer. Applications of empirical process theory, volume 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2000.
A.W. van der Waart and J.A. Wellner. Weak convergence and empirical processes. Springer, 1996.
V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.
R. Vershynin. Lectures in geometric functional analysis. 2009.
Acknowledgements
We thank Sam Hopkins, Stanislav Minsker, and Roberto Imbuzeiro Oliveira for illuminating discussions on the subject. We also thank two referees for their thorough reports and insightful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Albert Cohen.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Gábor Lugosi was supported by the Spanish Ministry of Economy and Competitiveness, Grant MTM2015-67304-P and FEDER, EU, by “High-dimensional problems in structured probabilistic models - Ayudas Fundación BBVA a Equipos de Investigación Cientifica 2017” and by “Google Focused Award Algorithms and Learning for AI.” Shahar Mendelson was supported in part by the Israel Science Foundation.
Rights and permissions
About this article
Cite this article
Lugosi, G., Mendelson, S. Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey. Found Comput Math 19, 1145–1190 (2019). https://doi.org/10.1007/s10208-019-09427-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-019-09427-x
Keywords
- Mean estimation
- Heavy-tailed distributions
- Robustness
- Regression function estimation
- Statistical learning