Pattern Classification and Learning Theory

Lugosi, G.

doi:10.1007/978-3-7091-2568-7_1

G. Lugosi⁸

Part of the book series: International Centre for Mechanical Sciences ((CISM,volume 434))

672 Accesses
23 Citations

Abstract

Pattern recognition (or classification or discrimination) is about guessing or predicting the unknown class of an observation. An observation is a collection of numerical measurements, represented by a d-dimensional vector x. The unknown nature of the observation is called a class. It is denoted by y and takes values in the set {0,1}. (For simplicity, we restrict our attention to binary classification.) In pattern recognition, one creates a function g(x): R ^d → {0, 1} which represents one’s guess of y given x. The mapping g is called a classifier. A classifier errs on x if g(x) ≠ y.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

General

M. Anthony and P. L. Bartlett, Neural Network Learning: Theoretical Foundations, Cambridge University Press, Cambridge, 1999.
Book MATH Google Scholar
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.
Google Scholar
L. Devroye, L. Györfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York, 1996.
Book MATH Google Scholar
V.N. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York, 1982.
Google Scholar
V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.
Book MATH Google Scholar
V.N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.
MATH Google Scholar
V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.
Google Scholar

Concentration for sums of independent random variables

G. Bennett. Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association, 57: 33–45, 1962.
Article MATH Google Scholar
S.N. Bernstein. The Theory of Probabilities. Gastehizdat Publishing House, Moscow, 1946.
Google Scholar
H. Chernoff. A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23: 493–507, 1952.
Article MathSciNet MATH Google Scholar
T. Hagerup and C. Rüb. A guided tour of Chernoff bounds. Information Processing Letters, 33: 305–308, 1990.
Article MathSciNet MATH Google Scholar
C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics 1989, pages 148–188. Cambridge University Press, Cambridge, 1989.
Google Scholar
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58: 13–30, 1963.
Article MathSciNet MATH Google Scholar
R.M. Karp. Probabilistic Analysis of Algorithms. Class Notes, University of California, Berkeley, 1988.
Google Scholar
M. Okamoto. Some inequalities relating to the partial sum of binomial probabilities. Annals of the Institute of Statistical Mathematics, 10: 29–35, 1958.
Article MathSciNet MATH Google Scholar

Concentration

K. Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 68: 357–367, 1967.
Article MathSciNet Google Scholar
S. Boucheron, G. Lugosi, and P. Massart. A sharp concentration inequality with applications in random combinatorics and learning. Random Structures and Algorithms, 16: 277292, 2000.
Google Scholar
L. Devroye. Exponential inequalities in nonparametric estimation. In G. Roussas, editor, Nonparametric Functional Estimation and Related Topics, pages 31–44. NATO ASI Series, Kluwer Academic Publishers, Dordrecht, 1991.
Chapter Google Scholar
J. H. Kim. The Ramsey number R(3, t) has order of magnitude t2/ log t. Random Structures and Algorithms, 7: 173–207, 1995.
Google Scholar
M. Ledoux. On Talagrand’s deviation inequalities for product measures. ESAIM: Proba-bility and Statistics, 1, 63–87, (1996).
Article MathSciNet MATH Google Scholar
K. Marton. A simple proof of the blowing-up lemma. IEEE Transactions on Information Theory, 32: 44546, 1986.
Article MathSciNet Google Scholar
K. Marton. Bounding J-distance by informational divergence: a way to prove measure concentration. Annals of Probability, to appear: 0–0, 1996.
Google Scholar
K. Marton. A measure concentration inequality for contracting Markov chains Geometric and Functional Analysis, 6:556–571, 1996. Erratum: 7: 609–613, 1997.
Article MathSciNet MATH Google Scholar
P. Massart. About the constant in Talagrand’s concentration inequalities from empirical processes. Annals of Probability, 28: 863–884, 2000.
Article MathSciNet MATH Google Scholar
W. Rhee and M. Talagrand. Martingales, inequalities, and NP-complete problems. Mathematics of Operations Research, 12: 177–181, 1987.
Article MathSciNet MATH Google Scholar
J.M. Steele. An Efron-Stein inequality for nonsymmetric statistics. Annals of Statistics, 14: 753–758, 1986.
Article MathSciNet MATH Google Scholar
M. Talagrand. Concentration of measure and isoperimetric inequalities in product spaces. I.H.E.S. Publications Mathématiques, 81: 73–205, 1996.
Google Scholar
M. Talagrand. New concentration inequalities in product spaces. Invent. Math. 126: 505–563, 1996.
Article MathSciNet Google Scholar
M. Talagrand. A new look at independence. Annals of Probability,24:0–0, 1996. special invited paper.
Google Scholar

VC theory

K. Alexander. Probability inequalities for empirical processes and a law of the iterated logarithm. Annals of Probability, 4: 1041–1067, 1984.
Article Google Scholar
M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47: 207–217, 1993.
Article MathSciNet MATH Google Scholar
P. Bartlett and G. Lugosi. An inequality for uniform deviations of sample averages from their means. Statistics and Probability Letters, 44: 55–62, 1999.
Article MathSciNet MATH Google Scholar
L. Breiman. Bagging predictors. Machine Learning, 24: 123–140, 1996.
MathSciNet MATH Google Scholar
Devroye, L. Bounds for the uniform deviation of empirical measures. Journal of Multivariate Analysis, 12: 72–79, 1982.
Article MathSciNet MATH Google Scholar
A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82: 247–261, 1989.
Article MathSciNet MATH Google Scholar
Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121: 256–285, 1995.
Article MathSciNet MATH Google Scholar
E. Giné and J. Zinn. Some limit theorems for empirical processes. Annals of Probability, 12: 929–989, 1984.
Article MathSciNet MATH Google Scholar
D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100: 78–150, 1992.
Article MathSciNet MATH Google Scholar
V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers, Annals of Statistics, 30, 2002.
Google Scholar
M. Ledoux and M. Talagrand. Probability in Banach Space, Springer-Verlag, New York, 1991.
Book Google Scholar
G. Lugosi. Improved upper bounds for probabilities of uniform deviations. Statistics and Probability Letters, 25: 71–77, 1995.
Article MathSciNet MATH Google Scholar
D. Pollard. Convergence of Stochastic Processes. Springer-Verlag, New York, 1984.
Book MATH Google Scholar
R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee. Boosting the margin: a new explanation for the effectiveness of voting methods, Annals of Statistics, 26: 1651–1686, 1998.
Article MathSciNet MATH Google Scholar
R.E. Schapire. The strength of weak learnability. Machine Learning, 5: 197–227, 1990.
Google Scholar
M. Talagrand. Sharper bounds for Gaussian and empirical processes. Annals of Probability, 22: 28–76, 1994.
Article MathSciNet MATH Google Scholar
S. Van de Geer. Estimating a regression function. Annals of Statistics, 18: 907–924, 1990.
Article MathSciNet MATH Google Scholar
V.N. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York, 1982.
Google Scholar
V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.
Book MATH Google Scholar
V.N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.
MATH Google Scholar
V.N. Vapnik and A.Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16: 264–280, 1971.
Article MATH Google Scholar
V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.
Google Scholar
A. W. van der Vaart and J. A. Wellner. Weak convergence and empirical processes, Springer-Verlag, New York, 1996.
Book MATH Google Scholar

Shatter coefficients, VC dimension

P. Assouad, Sur les classes de Vapnik-Chervonenkis, C.R. Acad. Sci. Paris, vol. 292, Sér.I, pp. 921–924, 1981.
Google Scholar
T. M. Cover, Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Transactions on Electronic Computers, vol. 14, pp. 326–334, 1965.
Article MATH Google Scholar
R. M. Dudley, Central limit theorems for empirical measures, Annals of Probability, vol. 6, pp. 899–929, 1978.
Article MathSciNet MATH Google Scholar
R. M. Dudley, Balls in R k do not cut all subsets of k + 2 points, Advances in Mathematics, vol. 31 (3), pp. 306–308, 1979.
Article MathSciNet MATH Google Scholar
P. Franid, On the trace of finite sets, Journal of Combinatorial Theory, Series A, vol. 34, pp. 41–45, 1983.
Google Scholar
D. Haussier, Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension, Journal of Combinatorial Theory, Series A, vol. 69, pp. 217–232, 1995.
Google Scholar
N. Sauer, On the density of families of sets, Journal of Combinatorial Theory Series A, vol. 13, pp. 145–147, 1972.
Article MathSciNet MATH Google Scholar
L. Schläffli, Gesammelte Mathematische Abhandlungen, Birkhäuser-Verlag, Basel, 1950.
Book Google Scholar
S. Shelah, A combinatorial problem: stability and order for models and theories in infinity languages, Pacific Journal of Mathematics, vol. 41, pp. 247–261, 1972.
Article MathSciNet MATH Google Scholar
J. M. Steele, Combinatorial entropy and uniform limit laws, Ph.D. dissertation, Stanford University, Stanford, CA, 1975.
Google Scholar
J. M. Steele, Existence of submatrices with all possible columns, Journal of Combinatorial Theory, Series A, vol. 28, pp. 84–88, 1978.
MathSciNet Google Scholar
R. S. Wenocur and R. M. Dudley, Some special Vapnik-Chervonenkis classes, Discrete Mathematics, vol. 33, pp. 313–318, 1981.
Article MathSciNet MATH Google Scholar

Lower bounds

A. Antos and G. Lugosi. Strong minimax lower bounds for learning. Machine Learning, vol. 30, 31–56, 1998.
Article Google Scholar
P. Assouad. Deux remarques sur l’estimation. Comptes Rendus de l’Académie des Sciences de Paris, 296: 1021–1024, 1983.
MathSciNet MATH Google Scholar
L. Birgé. Approximation dans les espaces métriques et théorie de l’estimation. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 65: 181–237, 1983.
Article MATH Google Scholar
L. Birgé. On estimating a density using Hellinger distance and some other strange facts. Probability Theory and Related Fields, 71: 271–291, 1986.
Article MathSciNet MATH Google Scholar
A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Learnability and the VapnikChervonenkis dimension. Journal of the ACM, 36: 929–965, 1989.
Article MathSciNet MATH Google Scholar
L. Devroye and G. Lugosi. Lower bounds in pattern recognition and learning. Pattern Recognition, 28: 1011–1018, 1995.
Article Google Scholar
A. Ehrenfeucht, D. Haussier, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82: 247–261, 1989.
Article MathSciNet MATH Google Scholar
D. Haussier, N. Littlestone, and M. Warmuth. Predicting {0, 1}-functions on randomly drawn points. Information and Computation, 115: 248–292, 1994.
Article MathSciNet Google Scholar
E. Mammen, A. B. Tsybakov. Smooth discrimination analysis. The Annals of Statistics, 27: 1808–1829, 1999.
Article MathSciNet MATH Google Scholar
D. Schuurmans. Characterizing rational versus exponential learning curves. In Computational Learning Theory: Second European Conference. EuroCOLT’95, pages 272–286. Springer Verlag, 1995.
Google Scholar
V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.
Google Scholar
S. Geman and C.R. Hwang. Nonparametric maximum likelihood estimation by the method of sieves. Annals of Statistics, 10: 401–414, 1982.
Article MathSciNet MATH Google Scholar

Complexity regularization

H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19: 716–723, 1974.
Article MathSciNet MATH Google Scholar
A.R. Barron. Logically smooth density estimation. Technical Report TR 56, Department of Statistics, Stanford University, 1985.
Google Scholar
A.R. Barron. Complexity regularization with application to artificial neural networks. In G. Roussas, editor, Nonparametric Functional Estimation and Related Topics, pages 561576. NATO ASI Series, Kluwer Academic Publishers, Dordrecht, 1991.
Google Scholar
A.R. Barron, L. Birgé, and R Massart. Risk bounds for model selection via penalization. Probability Theory and Related fields, 113: 301–413, 1999.
Article MathSciNet MATH Google Scholar
A.R. Barron and T.M. Cover. Minimum complexity density estimation. IEEE Transactions on Information Theory, 37: 1034–1054, 1991.
Article MathSciNet MATH Google Scholar
P. L. Bartlett. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44 (2): 525–536, March 1998.
Article MathSciNet MATH Google Scholar
P. Bartlett, S. Boucheron, and G. Lugosi, Model selection and error estimation. Proceedings of the 13th Annual Conference on Computational Learning Theory, ACM Press, pp. 286–297, 2000.
Google Scholar
L. Birgé and P. Massart. From model selection to adaptive estimation. In E. Torgersen D. Pollard and G. Yang, editors, Festschrift for Lucien Le Cam: Research papers in Probability and Statistics, pages 55–87. Springer, New York, 1997.
Chapter Google Scholar
L. Birgé and R Massart. Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli, 4: 329–375, 1998.
Article MathSciNet MATH Google Scholar
Y. Freund. Self bounding learning algorithms. Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 247–258, 1998.
Google Scholar
A.R. Gallant. Nonlinear Statistical Models. John Wiley, New York, 1987.
Book MATH Google Scholar
M. Kearns, Y. Mansour, A.Y. Ng, and D. Ron. An experimental and theoretical comparison of model selection methods. In Proceedings of the Eighth Annual ACM Workshop on Computational Learning Theory, pages 21–30. Association for Computing Machinery, New York, 1995.
Google Scholar
A. Krzyzak and T. Linder. Radial basis function networks and complexity regularization in function learning. IEEE Transactions on Neural Networks, 9: 247–256, 1998.
Article Google Scholar
G. Lugosi and A. Nobel. Adaptive model selection using empirical complexities. Annals of Statistics, vol. 27, no. 6, 1999.
Google Scholar
G. Lugosi and K. Zeger. Nonparametric estimation via empirical risk minimization IEEE Transactions on Information Theory, 41: 677–678, 1995.
Article MathSciNet MATH Google Scholar
G. Lugosi and K. Zeger. Concept learning using complexity regularization. IEEE Transactions on Information Theory, 42: 48–54, 1996.
Article MathSciNet MATH Google Scholar
C.L. Mallows. Some comments on cp. IEEE Technometrics, 15: 661–675, 1997.
Google Scholar
P. Massart. Some applications of concentration inequalities to statistics. Annales de la faculté des sciences de l’université de Toulouse, Mathématiques, série 6, IX: 245–303, 2000.
Google Scholar
R. Meir. Performance bounds for nonlinear time series prediction. In Proceedings of the Tenth Annual ACM Workshop on Computational Learning Theory, page 122–129. Association for Computing Machinery, New York, 1997.
Google Scholar
D.S. Modha and E. Masry. Minimum complexity regression estimation with weakly de-pendent observations. IEEE Transactions on Information Theory, 42: 2133–2145, 1996.
Article MathSciNet MATH Google Scholar
J. Rissanen. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11: 416–431, 1983.
Article MathSciNet MATH Google Scholar
G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6: 461–464, 1978.
Article MathSciNet MATH Google Scholar
J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44 (5): 1926–1940, 1998.
Article MathSciNet MATH Google Scholar
X. Shen and W.H. Wong. Convergence rate of sieve estimates. Annals of Statistics, 22: 580–615, 1994.
Article MathSciNet MATH Google Scholar
Y. Yang and A.R. Barron. Information-theoretic determination of minimax rates of convergence. Annals of Statistics, to appear, 1997.
Google Scholar
Y. Yang and A.R. Barron. An asymptotic property of model selection criteria. IEEE Transactions on Information Theory, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Pompeu Fabra University, Barcelona, Spain
G. Lugosi

Authors

G. Lugosi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Budapest University of Technology and Economics, Hungary
László Györfi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lugosi, G. (2002). Pattern Classification and Learning Theory. In: Györfi, L. (eds) Principles of Nonparametric Learning. International Centre for Mechanical Sciences, vol 434. Springer, Vienna. https://doi.org/10.1007/978-3-7091-2568-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-7091-2568-7_1
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-83688-0
Online ISBN: 978-3-7091-2568-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics