Skip to main content

Pattern Classification and Learning Theory

  • Chapter
Principles of Nonparametric Learning

Part of the book series: International Centre for Mechanical Sciences ((CISM,volume 434))

Abstract

Pattern recognition (or classification or discrimination) is about guessing or predicting the unknown class of an observation. An observation is a collection of numerical measurements, represented by a d-dimensional vector x. The unknown nature of the observation is called a class. It is denoted by y and takes values in the set {0,1}. (For simplicity, we restrict our attention to binary classification.) In pattern recognition, one creates a function g(x): R d → {0, 1} which represents one’s guess of y given x. The mapping g is called a classifier. A classifier errs on x if g(x) ≠ y.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

General

  • M. Anthony and P. L. Bartlett, Neural Network Learning: Theoretical Foundations, Cambridge University Press, Cambridge, 1999.

    Book  MATH  Google Scholar 

  • N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.

    Google Scholar 

  • L. Devroye, L. Györfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York, 1996.

    Book  MATH  Google Scholar 

  • V.N. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York, 1982.

    Google Scholar 

  • V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.

    Book  MATH  Google Scholar 

  • V.N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.

    MATH  Google Scholar 

  • V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.

    Google Scholar 

Concentration for sums of independent random variables

  • G. Bennett. Probability inequalities for the sum of independent random variables. Journal of the American Statistical Association, 57: 33–45, 1962.

    Article  MATH  Google Scholar 

  • S.N. Bernstein. The Theory of Probabilities. Gastehizdat Publishing House, Moscow, 1946.

    Google Scholar 

  • H. Chernoff. A measure of asymptotic efficiency of tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23: 493–507, 1952.

    Article  MathSciNet  MATH  Google Scholar 

  • T. Hagerup and C. Rüb. A guided tour of Chernoff bounds. Information Processing Letters, 33: 305–308, 1990.

    Article  MathSciNet  MATH  Google Scholar 

  • C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics 1989, pages 148–188. Cambridge University Press, Cambridge, 1989.

    Google Scholar 

  • W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58: 13–30, 1963.

    Article  MathSciNet  MATH  Google Scholar 

  • R.M. Karp. Probabilistic Analysis of Algorithms. Class Notes, University of California, Berkeley, 1988.

    Google Scholar 

  • M. Okamoto. Some inequalities relating to the partial sum of binomial probabilities. Annals of the Institute of Statistical Mathematics, 10: 29–35, 1958.

    Article  MathSciNet  MATH  Google Scholar 

Concentration

  • K. Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 68: 357–367, 1967.

    Article  MathSciNet  Google Scholar 

  • S. Boucheron, G. Lugosi, and P. Massart. A sharp concentration inequality with applications in random combinatorics and learning. Random Structures and Algorithms, 16: 277292, 2000.

    Google Scholar 

  • L. Devroye. Exponential inequalities in nonparametric estimation. In G. Roussas, editor, Nonparametric Functional Estimation and Related Topics, pages 31–44. NATO ASI Series, Kluwer Academic Publishers, Dordrecht, 1991.

    Chapter  Google Scholar 

  • J. H. Kim. The Ramsey number R(3, t) has order of magnitude t2/ log t. Random Structures and Algorithms, 7: 173–207, 1995.

    Google Scholar 

  • M. Ledoux. On Talagrand’s deviation inequalities for product measures. ESAIM: Proba-bility and Statistics, 1, 63–87, (1996).

    Article  MathSciNet  MATH  Google Scholar 

  • K. Marton. A simple proof of the blowing-up lemma. IEEE Transactions on Information Theory, 32: 44546, 1986.

    Article  MathSciNet  Google Scholar 

  • K. Marton. Bounding J-distance by informational divergence: a way to prove measure concentration. Annals of Probability, to appear: 0–0, 1996.

    Google Scholar 

  • K. Marton. A measure concentration inequality for contracting Markov chains Geometric and Functional Analysis, 6:556–571, 1996. Erratum: 7: 609–613, 1997.

    Article  MathSciNet  MATH  Google Scholar 

  • P. Massart. About the constant in Talagrand’s concentration inequalities from empirical processes. Annals of Probability, 28: 863–884, 2000.

    Article  MathSciNet  MATH  Google Scholar 

  • W. Rhee and M. Talagrand. Martingales, inequalities, and NP-complete problems. Mathematics of Operations Research, 12: 177–181, 1987.

    Article  MathSciNet  MATH  Google Scholar 

  • J.M. Steele. An Efron-Stein inequality for nonsymmetric statistics. Annals of Statistics, 14: 753–758, 1986.

    Article  MathSciNet  MATH  Google Scholar 

  • M. Talagrand. Concentration of measure and isoperimetric inequalities in product spaces. I.H.E.S. Publications Mathématiques, 81: 73–205, 1996.

    Google Scholar 

  • M. Talagrand. New concentration inequalities in product spaces. Invent. Math. 126: 505–563, 1996.

    Article  MathSciNet  Google Scholar 

  • M. Talagrand. A new look at independence. Annals of Probability,24:0–0, 1996. special invited paper.

    Google Scholar 

VC theory

  • K. Alexander. Probability inequalities for empirical processes and a law of the iterated logarithm. Annals of Probability, 4: 1041–1067, 1984.

    Article  Google Scholar 

  • M. Anthony and J. Shawe-Taylor. A result of Vapnik with applications. Discrete Applied Mathematics, 47: 207–217, 1993.

    Article  MathSciNet  MATH  Google Scholar 

  • P. Bartlett and G. Lugosi. An inequality for uniform deviations of sample averages from their means. Statistics and Probability Letters, 44: 55–62, 1999.

    Article  MathSciNet  MATH  Google Scholar 

  • L. Breiman. Bagging predictors. Machine Learning, 24: 123–140, 1996.

    MathSciNet  MATH  Google Scholar 

  • Devroye, L. Bounds for the uniform deviation of empirical measures. Journal of Multivariate Analysis, 12: 72–79, 1982.

    Article  MathSciNet  MATH  Google Scholar 

  • A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82: 247–261, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  • Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121: 256–285, 1995.

    Article  MathSciNet  MATH  Google Scholar 

  • E. Giné and J. Zinn. Some limit theorems for empirical processes. Annals of Probability, 12: 929–989, 1984.

    Article  MathSciNet  MATH  Google Scholar 

  • D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100: 78–150, 1992.

    Article  MathSciNet  MATH  Google Scholar 

  • V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers, Annals of Statistics, 30, 2002.

    Google Scholar 

  • M. Ledoux and M. Talagrand. Probability in Banach Space, Springer-Verlag, New York, 1991.

    Book  Google Scholar 

  • G. Lugosi. Improved upper bounds for probabilities of uniform deviations. Statistics and Probability Letters, 25: 71–77, 1995.

    Article  MathSciNet  MATH  Google Scholar 

  • D. Pollard. Convergence of Stochastic Processes. Springer-Verlag, New York, 1984.

    Book  MATH  Google Scholar 

  • R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee. Boosting the margin: a new explanation for the effectiveness of voting methods, Annals of Statistics, 26: 1651–1686, 1998.

    Article  MathSciNet  MATH  Google Scholar 

  • R.E. Schapire. The strength of weak learnability. Machine Learning, 5: 197–227, 1990.

    Google Scholar 

  • M. Talagrand. Sharper bounds for Gaussian and empirical processes. Annals of Probability, 22: 28–76, 1994.

    Article  MathSciNet  MATH  Google Scholar 

  • S. Van de Geer. Estimating a regression function. Annals of Statistics, 18: 907–924, 1990.

    Article  MathSciNet  MATH  Google Scholar 

  • V.N. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York, 1982.

    Google Scholar 

  • V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.

    Book  MATH  Google Scholar 

  • V.N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.

    MATH  Google Scholar 

  • V.N. Vapnik and A.Ya. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16: 264–280, 1971.

    Article  MATH  Google Scholar 

  • V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.

    Google Scholar 

  • A. W. van der Vaart and J. A. Wellner. Weak convergence and empirical processes, Springer-Verlag, New York, 1996.

    Book  MATH  Google Scholar 

Shatter coefficients, VC dimension

  • P. Assouad, Sur les classes de Vapnik-Chervonenkis, C.R. Acad. Sci. Paris, vol. 292, Sér.I, pp. 921–924, 1981.

    Google Scholar 

  • T. M. Cover, Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Transactions on Electronic Computers, vol. 14, pp. 326–334, 1965.

    Article  MATH  Google Scholar 

  • R. M. Dudley, Central limit theorems for empirical measures, Annals of Probability, vol. 6, pp. 899–929, 1978.

    Article  MathSciNet  MATH  Google Scholar 

  • R. M. Dudley, Balls in R k do not cut all subsets of k + 2 points, Advances in Mathematics, vol. 31 (3), pp. 306–308, 1979.

    Article  MathSciNet  MATH  Google Scholar 

  • P. Franid, On the trace of finite sets, Journal of Combinatorial Theory, Series A, vol. 34, pp. 41–45, 1983.

    Google Scholar 

  • D. Haussier, Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension, Journal of Combinatorial Theory, Series A, vol. 69, pp. 217–232, 1995.

    Google Scholar 

  • N. Sauer, On the density of families of sets, Journal of Combinatorial Theory Series A, vol. 13, pp. 145–147, 1972.

    Article  MathSciNet  MATH  Google Scholar 

  • L. Schläffli, Gesammelte Mathematische Abhandlungen, Birkhäuser-Verlag, Basel, 1950.

    Book  Google Scholar 

  • S. Shelah, A combinatorial problem: stability and order for models and theories in infinity languages, Pacific Journal of Mathematics, vol. 41, pp. 247–261, 1972.

    Article  MathSciNet  MATH  Google Scholar 

  • J. M. Steele, Combinatorial entropy and uniform limit laws, Ph.D. dissertation, Stanford University, Stanford, CA, 1975.

    Google Scholar 

  • J. M. Steele, Existence of submatrices with all possible columns, Journal of Combinatorial Theory, Series A, vol. 28, pp. 84–88, 1978.

    MathSciNet  Google Scholar 

  • R. S. Wenocur and R. M. Dudley, Some special Vapnik-Chervonenkis classes, Discrete Mathematics, vol. 33, pp. 313–318, 1981.

    Article  MathSciNet  MATH  Google Scholar 

Lower bounds

  • A. Antos and G. Lugosi. Strong minimax lower bounds for learning. Machine Learning, vol. 30, 31–56, 1998.

    Article  Google Scholar 

  • P. Assouad. Deux remarques sur l’estimation. Comptes Rendus de l’Académie des Sciences de Paris, 296: 1021–1024, 1983.

    MathSciNet  MATH  Google Scholar 

  • L. Birgé. Approximation dans les espaces métriques et théorie de l’estimation. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 65: 181–237, 1983.

    Article  MATH  Google Scholar 

  • L. Birgé. On estimating a density using Hellinger distance and some other strange facts. Probability Theory and Related Fields, 71: 271–291, 1986.

    Article  MathSciNet  MATH  Google Scholar 

  • A. Blumer, A. Ehrenfeucht, D. Haussler, and M.K. Warmuth. Learnability and the VapnikChervonenkis dimension. Journal of the ACM, 36: 929–965, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  • L. Devroye and G. Lugosi. Lower bounds in pattern recognition and learning. Pattern Recognition, 28: 1011–1018, 1995.

    Article  Google Scholar 

  • A. Ehrenfeucht, D. Haussier, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82: 247–261, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  • D. Haussier, N. Littlestone, and M. Warmuth. Predicting {0, 1}-functions on randomly drawn points. Information and Computation, 115: 248–292, 1994.

    Article  MathSciNet  Google Scholar 

  • E. Mammen, A. B. Tsybakov. Smooth discrimination analysis. The Annals of Statistics, 27: 1808–1829, 1999.

    Article  MathSciNet  MATH  Google Scholar 

  • D. Schuurmans. Characterizing rational versus exponential learning curves. In Computational Learning Theory: Second European Conference. EuroCOLT’95, pages 272–286. Springer Verlag, 1995.

    Google Scholar 

  • V.N. Vapnik and A.Ya. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin, 1979.

    Google Scholar 

  • S. Geman and C.R. Hwang. Nonparametric maximum likelihood estimation by the method of sieves. Annals of Statistics, 10: 401–414, 1982.

    Article  MathSciNet  MATH  Google Scholar 

Complexity regularization

  • H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19: 716–723, 1974.

    Article  MathSciNet  MATH  Google Scholar 

  • A.R. Barron. Logically smooth density estimation. Technical Report TR 56, Department of Statistics, Stanford University, 1985.

    Google Scholar 

  • A.R. Barron. Complexity regularization with application to artificial neural networks. In G. Roussas, editor, Nonparametric Functional Estimation and Related Topics, pages 561576. NATO ASI Series, Kluwer Academic Publishers, Dordrecht, 1991.

    Google Scholar 

  • A.R. Barron, L. Birgé, and R Massart. Risk bounds for model selection via penalization. Probability Theory and Related fields, 113: 301–413, 1999.

    Article  MathSciNet  MATH  Google Scholar 

  • A.R. Barron and T.M. Cover. Minimum complexity density estimation. IEEE Transactions on Information Theory, 37: 1034–1054, 1991.

    Article  MathSciNet  MATH  Google Scholar 

  • P. L. Bartlett. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44 (2): 525–536, March 1998.

    Article  MathSciNet  MATH  Google Scholar 

  • P. Bartlett, S. Boucheron, and G. Lugosi, Model selection and error estimation. Proceedings of the 13th Annual Conference on Computational Learning Theory, ACM Press, pp. 286–297, 2000.

    Google Scholar 

  • L. Birgé and P. Massart. From model selection to adaptive estimation. In E. Torgersen D. Pollard and G. Yang, editors, Festschrift for Lucien Le Cam: Research papers in Probability and Statistics, pages 55–87. Springer, New York, 1997.

    Chapter  Google Scholar 

  • L. Birgé and R Massart. Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli, 4: 329–375, 1998.

    Article  MathSciNet  MATH  Google Scholar 

  • Y. Freund. Self bounding learning algorithms. Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 247–258, 1998.

    Google Scholar 

  • A.R. Gallant. Nonlinear Statistical Models. John Wiley, New York, 1987.

    Book  MATH  Google Scholar 

  • M. Kearns, Y. Mansour, A.Y. Ng, and D. Ron. An experimental and theoretical comparison of model selection methods. In Proceedings of the Eighth Annual ACM Workshop on Computational Learning Theory, pages 21–30. Association for Computing Machinery, New York, 1995.

    Google Scholar 

  • A. Krzyzak and T. Linder. Radial basis function networks and complexity regularization in function learning. IEEE Transactions on Neural Networks, 9: 247–256, 1998.

    Article  Google Scholar 

  • G. Lugosi and A. Nobel. Adaptive model selection using empirical complexities. Annals of Statistics, vol. 27, no. 6, 1999.

    Google Scholar 

  • G. Lugosi and K. Zeger. Nonparametric estimation via empirical risk minimization IEEE Transactions on Information Theory, 41: 677–678, 1995.

    Article  MathSciNet  MATH  Google Scholar 

  • G. Lugosi and K. Zeger. Concept learning using complexity regularization. IEEE Transactions on Information Theory, 42: 48–54, 1996.

    Article  MathSciNet  MATH  Google Scholar 

  • C.L. Mallows. Some comments on cp. IEEE Technometrics, 15: 661–675, 1997.

    Google Scholar 

  • P. Massart. Some applications of concentration inequalities to statistics. Annales de la faculté des sciences de l’université de Toulouse, Mathématiques, série 6, IX: 245–303, 2000.

    Google Scholar 

  • R. Meir. Performance bounds for nonlinear time series prediction. In Proceedings of the Tenth Annual ACM Workshop on Computational Learning Theory, page 122–129. Association for Computing Machinery, New York, 1997.

    Google Scholar 

  • D.S. Modha and E. Masry. Minimum complexity regression estimation with weakly de-pendent observations. IEEE Transactions on Information Theory, 42: 2133–2145, 1996.

    Article  MathSciNet  MATH  Google Scholar 

  • J. Rissanen. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11: 416–431, 1983.

    Article  MathSciNet  MATH  Google Scholar 

  • G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6: 461–464, 1978.

    Article  MathSciNet  MATH  Google Scholar 

  • J. Shawe-Taylor, P. L. Bartlett, R. C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44 (5): 1926–1940, 1998.

    Article  MathSciNet  MATH  Google Scholar 

  • X. Shen and W.H. Wong. Convergence rate of sieve estimates. Annals of Statistics, 22: 580–615, 1994.

    Article  MathSciNet  MATH  Google Scholar 

  • Y. Yang and A.R. Barron. Information-theoretic determination of minimax rates of convergence. Annals of Statistics, to appear, 1997.

    Google Scholar 

  • Y. Yang and A.R. Barron. An asymptotic property of model selection criteria. IEEE Transactions on Information Theory, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Wien

About this chapter

Cite this chapter

Lugosi, G. (2002). Pattern Classification and Learning Theory. In: Györfi, L. (eds) Principles of Nonparametric Learning. International Centre for Mechanical Sciences, vol 434. Springer, Vienna. https://doi.org/10.1007/978-3-7091-2568-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-2568-7_1

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-211-83688-0

  • Online ISBN: 978-3-7091-2568-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics