Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey

Lugosi, Gábor; Mendelson, Shahar

doi:10.1007/s10208-019-09427-x

Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey

Published: 05 August 2019

Volume 19, pages 1145–1190, (2019)
Cite this article

Foundations of Computational Mathematics Aims and scope Submit manuscript

Gábor Lugosi^1,2,3 &
Shahar Mendelson^4,5

2225 Accesses
60 Citations
Explore all metrics

Abstract

We survey some of the recent advances in mean estimation and regression function estimation. In particular, we describe sub-Gaussian mean estimators for possibly heavy-tailed data in both the univariate and multivariate settings. We focus on estimators based on median-of-means techniques, but other methods such as the trimmed-mean and Catoni’s estimators are also reviewed. We give detailed proofs for the cornerstone results. We dedicate a section to statistical learning problems—in particular, regression function estimation—in the presence of possibly heavy-tailed data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

Regression analysis: likelihood, error and entropy

Article 23 March 2018

Variable selection of varying dispersion student-t regression models

Article 28 November 2014

Notes

As we explain in what follows, it suffices to ensure that the comparison is correct between \(\mu \) and any point that is not too close to \(\mu \).
In the proof of Theorem 8, “well-behaved” means that (3.5) holds for a majority of the blocks.
The case \(q=3\) is the standard Berry–Esseen theorem, while for \(2<q<3\) one may use generalized Berry–Esseen bounds, see [71].
Note that one has the freedom to select a function \(\widehat{f}\) that does not belong to \({{\mathcal {F}}}\).

References

N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58:137–147, 2002.
Article MathSciNet Google Scholar
G. Aloupis. Geometric measures of data depth. DIMACS series in discrete mathematics and theoretical computer science, 72:147–158, 2006.
Article MathSciNet Google Scholar
M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
Book Google Scholar
J.-Y. Audibert and O. Catoni. Robust linear least squares regression. The Annals of Statistics, 39:2766–2794, 2011.
Article MathSciNet Google Scholar
Y. Baraud and L. Birgé. Rho-estimators revisited: General theory and applications. The Annals of Statistics, 46(6B):3767–3804, 2018.
Article MathSciNet Google Scholar
Y. Baraud, L. Birgé, and M. Sart. A new method for estimation and model selection: \(\rho \)-estimation. Inventiones Mathematicae, 207(2):425–517, 2017.
Article MathSciNet Google Scholar
P.L. Bartlett, O. Bousquet, and S. Mendelson. Localized Rademacher complexities. Annals of Statistics, 33:1497–1537, 2005.
Article MathSciNet Google Scholar
P.J. Bickel. On some robust estimates of location. The Annals of Mathematical Statistics, 36:847–858, 1965.
Article MathSciNet Google Scholar
A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Learnability and the Vapnik–Chervonenkis dimension. Journal of the ACM, 36:929–965, 1989.
Article MathSciNet Google Scholar
S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities:A Nonasymptotic Theory of Independence. Oxford University Press, 2013.
Book Google Scholar
C. Brownlees, E. Joly, and G. Lugosi. Empirical risk minimization for heavy-tailed losses. Annals of Statistics, 43:2507–2536, 2015.
Article MathSciNet Google Scholar
S. Bubeck, N. Cesa-Bianchi, and G. Lugosi. Bandits with heavy tail. IEEE Transactions on Information Theory, 59:7711–7717, 2013.
Article MathSciNet Google Scholar
P. Bühlmann and S. van de Geer. Statistics for high-dimensional data. Springer Series in Statistics. Springer, Heidelberg, 2011. Methods, theory and applications.
Chapter Google Scholar
O. Catoni. Challenging the empirical mean and empirical variance: a deviation study. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 48(4):1148–1185, 2012.
Article MathSciNet Google Scholar
O. Catoni and I. Giulini. Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression. arXiv preprint arXiv:1712.02747, 2017.
O. Catoni and I. Giulini. Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector. arXiv preprint arXiv:1802.04308, 2018.
Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 47–60. ACM, 2017.
Y. Cherapanamjeri, N. Flammarion, and P. Bartlett. Fast mean estimation with sub-Gaussian rates. arXiv preprint arXiv:1902.01998, 2019.
M. Chichignoud and J. Lederer. A robust, adaptive m-estimator for pointwise estimation in heteroscedastic regression. Bernoulli, 20(3):1560–1599, 2014.
Article MathSciNet Google Scholar
M.B. Cohen, Y.T. Lee, G. Miller, J. Pachocki, and A. Sidford. Geometric median in nearly linear time. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, pages 9–21. ACM, 2016.
L. Devroye, L. Györfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York, 1996.
Book Google Scholar
L. Devroye, M. Lerasle, G. Lugosi, and R.I. Oliveira. Sub-Gaussian mean estimators. Annals of Statistics, 2016.
I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Robust estimators in high dimensions without the computational intractability. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 655–664. IEEE, 2016.
I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Being robust (in high dimensions) can be practical. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), 2017.
I. Diakonikolas, G. Kamath, D.M. Kane, J. Li, A. Moitra, and A. Stewart. Robustly learning a Gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2683–2702. Society for Industrial and Applied Mathematics, 2018.
I. Diakonikolas, D.M. Kane, and A. Stewart. Efficient robust proper learning of log-concave distributions. arXiv preprint arXiv:1606.03077, 2016.
I. Diakonikolas, W. Kong, and A. Stewart. Efficient algorithms and lower bounds for robust linear regression. arXiv preprint arXiv:1806.00040, 2018.
J. Fan, Q. Li, and Y. Wang. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(1):247–265, 2017.
Article MathSciNet Google Scholar
L. Györfi, M. Kohler, A. Krzyżak, and H. Walk. A distribution-free theory of nonparametric regression. Springer-Verlag, New York, 2002.
Book Google Scholar
F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, and W.A. Stahel. Robust statistics: the approach based on influence functions, volume 196. Wiley, 1986.
MATH Google Scholar
Q. Han and J.A. Wellner. A sharp multiplier inequality with applications to heavy-tailed regression problems. arXiv preprint arXiv:1706.02410, 2017.
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30, 1963.
Article MathSciNet Google Scholar
S.B. Hopkins. Sub-Gaussian mean estimation in polynomial time. Annals of Statistics, 2019, to appear.
S.B. Hopkins and J. Li. Mixture models, robustness, and sum of squares proofs. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1021–1034. ACM, 2018.
D. Hsu. Robust statistics. http://www.inherentuncertainty.org/2010/12/robust-statistics.html, 2010.
D. Hsu and S. Sabato. Loss minimization and parameter estimation with heavy tails. Journal of Machine Learning Research, 17:1–40, 2016.
MathSciNet MATH Google Scholar
M. Huber. An optimal (\(\epsilon \), \(\delta \))-randomized approximation scheme for the mean of random variables with bounded relative variance. Random Structures & Algorithms, 2019.
P.J. Huber. Robust estimation of a location parameter. The annals of mathematical statistics, 35(1):73–101, 1964.
Article MathSciNet Google Scholar
P.J. Huber and E.M. Ronchetti. Robust statistics. Wiley, New York, 2009. Second edition.
M. Jerrum, L. Valiant, and V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43:186–188, 1986.
Article MathSciNet Google Scholar
E. Joly, G. Lugosi, and R. I. Oliveira. On the estimation of the mean of a random vector. Electronic Journal of Statistics, 11:440–451, 2017.
Article MathSciNet Google Scholar
A. Klivans, P.K. Kothari, and R. Meka. Efficient algorithms for outlier-robust regression. In Proceedings of the 31st Annual Conference of Learning Theory (COLT 2018), 2018.
V. Koltchinskii. Oracle inequalities in empirical risk minimization and sparse recovery problems, volume 2033 of Lecture Notes in Mathematics. Springer, Heidelberg, 2011. Lectures from the 38th Probability Summer School held in Saint-Flour, 2008, École d’Été de Probabilités de Saint-Flour. [Saint-Flour Probability Summer School].
P.K. Kothari, J. Steinhardt, and D. Steurer. Robust moment estimation and improved clustering via sum of squares. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1035–1046. ACM, 2018.
Kevin A. Lai, Anup B. Rao, and Santosh Vempala. Agnostic estimation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on, pages 665–674. IEEE, 2016.
G. Lecué and M. Lerasle. Learning from mom’s principles: Le cam’s approach. arXiv preprint arXiv:1701.01961, 2017.
G. Lecué and M. Lerasle. Robust machine learning by median-of-means: theory and practice. Annals of Stastistics, 2019, to appear.
G. Lecué, M. Lerasle, and T. Mathieu. Robust classification via mom minimization. arXiv preprint arXiv:1808.03106, 2018.
G. Lecué and S. Mendelson. Learning subgaussian classes: Upper and minimax bounds. In S. Boucheron and N. Vayatis, editors, Topics in Learning Theory. Societe Mathematique de France, 2016.
G. Lecué and S. Mendelson. Performance of empirical risk minimization in linear aggregation. Bernoulli, 22(3):1520–1534, 2016.
Article MathSciNet Google Scholar
M. Ledoux. The concentration of measure phenomenon. American Mathematical Society, Providence, RI, 2001.
MATH Google Scholar
M. Ledoux and M. Talagrand. Probability in Banach Space. Springer-Verlag, New York, 1991.
Book Google Scholar
M. Lerasle and R. I. Oliveira. Robust empirical mean estimators. arXiv:1112.3914, 2012.
Po-Ling Loh and Xin Lu Tan. High-dimensional robust precision matrix estimation: Cellwise corruption under \(\epsilon \)-contamination. Electronic Journal of Statistics, 12(1):1429–1467, 2018.
Article MathSciNet Google Scholar
G. Lugosi and S. Mendelson. Robust multivariate mean estimation: the optimality of trimmed mean. manuscript, 2019.
G. Lugosi and S. Mendelson. Sub-Gaussian estimators of the mean of a random vector. Annals of Statistics, 47:783–794, 2019.
Article MathSciNet Google Scholar
G. Lugosi and S. Mendelson. Near-optimal mean estimators with respect to general norms. Probability Theory and Related Fields, 2019, to appear.
G. Lugosi and S. Mendelson. Regularization, sparse recovery, and median-of-means tournaments. Bernoulli, 2019, to appear.
G. Lugosi and S. Mendelson. Risk minimization by median-of-means tournaments. Journal of the European Mathematical Society, 2019, to appear.
P. Massart. Concentration inequalities and model selection. Ecole d’été de Probabilités de Saint-Flour 2003. Lecture Notes in Mathematics. Springer, 2006.
S. Mendelson. Learning without concentration. Journal of the ACM, 62:21, 2015.
Article MathSciNet Google Scholar
S. Mendelson. An optimal unrestricted learning procedure. arXiv preprint arXiv:1707.05342, 2017.
S. Mendelson. Learning without concentration for general loss functions. Probability Theory and Related Fields, 171(1-2):459–502, 2018.
Article MathSciNet Google Scholar
S. Mendelson and N. Zhivotovskiy. Robust covariance estimation under \({L}_4-{L}_2\) norm equivalence. arXiv preprint arXiv:1809.10462, 2018.
S. Minsker. Geometric median and robust estimation in Banach spaces. Bernoulli, 21:2308–2335, 2015.
Article MathSciNet Google Scholar
Stanislav Minsker. Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries. The Annals of Statistics, 46(6A):2871–2903, 2018.
Article MathSciNet Google Scholar
Stanislav Minsker. Uniform bounds for robust mean estimators. arXiv preprint arXiv:1812.03523, 2018.
Stanislav Minsker and Nate Strawn. Distributed statistical estimation and rates of convergence in normal approximation. arXiv preprint arXiv:1704.02658, 2017.
A.S. Nemirovsky and D.B. Yudin. Problem complexity and method efficiency in optimization. 1983.
Roberto I. Oliveira and Paulo Orenstein. The sub-Gaussian property of trimmed means estimators. Technical report, IMPA, 2019.
Valentin V Petrov. Limit theorems of probability theory: sequences of independent random variables. Technical report, Oxford, New York, 1995.
IG Shevtsova. On the absolute constants in the Berry–Esseen-type inequalities. In Doklady Mathematics, volume 89, pages 378–381. Springer, 2014.
C.G. Small. A survey of multidimensional medians. International Statistical Review, pages 263–277, 1990.
Google Scholar
S.M. Stigler. The asymptotic distribution of the trimmed mean. The Annals of Statistics, 1:472–477, 1973.
Article MathSciNet Google Scholar
B.S. Tsirelson, I.A. Ibragimov, and V.N. Sudakov. Norm of Gaussian sample function. In Proceedings of the 3rd Japan-U.S.S.R. Symposium on Probability Theory, volume 550 of Lecture Notes in Mathematics, pages 20–41. Springer-Verlag, Berlin, 1976.
A. B. Tsybakov. Introduction to nonparametric estimation. Springer Series in Statistics. Springer, New York, 2009.
Book Google Scholar
J.W. Tukey. Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, Vancouver, 1975, volume 2, pages 523–531, 1975.
J.W. Tukey and D.H. McLaughlin. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/winsorization 1. Sankhyā: The Indian Journal of Statistics, Series A, 25:331–352, 1963.
MathSciNet MATH Google Scholar
L.G. Valiant. A theory of the learnable. Communications of the ACM, 27:1134–1142, 1984.
Article Google Scholar
S. van de Geer. Applications of empirical process theory, volume 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2000.
A.W. van der Waart and J.A. Wellner. Weak convergence and empirical processes. Springer, 1996.
Google Scholar
V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.
R. Vershynin. Lectures in geometric functional analysis. 2009.

Download references

Acknowledgements

We thank Sam Hopkins, Stanislav Minsker, and Roberto Imbuzeiro Oliveira for illuminating discussions on the subject. We also thank two referees for their thorough reports and insightful comments.

Author information

Authors and Affiliations

Department of Economics and Business, Pompeu Fabra University, Barcelona, Spain
Gábor Lugosi
ICREA, Pg. Lluís Companys 23, 08010, Barcelona, Spain
Gábor Lugosi
Barcelona Graduate School of Economics, Barcelona, Spain
Gábor Lugosi
Mathematical Sciences Institute, The Australian National University, Canberra, Australia
Shahar Mendelson
LPSM, Sorbonne University, Paris, France
Shahar Mendelson

Authors

Gábor Lugosi
View author publications
You can also search for this author in PubMed Google Scholar
Shahar Mendelson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gábor Lugosi.

Additional information

Communicated by Albert Cohen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Gábor Lugosi was supported by the Spanish Ministry of Economy and Competitiveness, Grant MTM2015-67304-P and FEDER, EU, by “High-dimensional problems in structured probabilistic models - Ayudas Fundación BBVA a Equipos de Investigación Cientifica 2017” and by “Google Focused Award Algorithms and Learning for AI.” Shahar Mendelson was supported in part by the Israel Science Foundation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lugosi, G., Mendelson, S. Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey. Found Comput Math 19, 1145–1190 (2019). https://doi.org/10.1007/s10208-019-09427-x

Download citation

Received: 19 October 2018
Revised: 19 October 2018
Accepted: 02 April 2019
Published: 05 August 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s10208-019-09427-x

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey

Abstract

Access this article

Similar content being viewed by others

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

Regression analysis: likelihood, error and entropy

Variable selection of varying dispersion student-t regression models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey

Abstract

Access this article

Similar content being viewed by others

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

Regression analysis: likelihood, error and entropy

Variable selection of varying dispersion student-t regression models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation