Skip to main content

On the Consistency of Discrete Bayesian Learning

  • Conference paper
STACS 2007 (STACS 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4393))

Included in the following conference series:

  • 1119 Accesses

Abstract

This paper accomplishes the last step in a series of consistency theorems for Bayesian learners based on discree hypothesis class, being initiated by Solomonoff’s 1978 work. Precisely, we show the generalization of a performance guarantee for Bayesian stochastic model selection, which has been proven very recently by the author for finite observation space, to countable and continuous observation space as well as mixtures. This strong result is (to the author’s knowledge) the first of this kind for stochastic model selection. It states almost sure consistency of the learner in the realizable case, that is, where one of the hypotheses/models considered coincides with the truth. Moreover, it implies error bounds on the difference of the predictive distribution to the true one, and even loss bounds w.r.t. arbitrary loss functions. The set of consistency theorems for the three natural variants of discrete Bayesian prediction, namely marginalization, MAP, and stochastic model selection, is thus being completed for general observation space. Hence, this is the right time to recapitulate all these results, to present them in a unified context, and to discuss the different situations of Bayesian learning and its different methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Poland, J.: The missing consistency theorem for Bayesian learning: Stochastic model selection. In: Balcázar, J.L., Long, P.M., Stephan, F. (eds.) ALT 2006. LNCS (LNAI), vol. 4264, pp. 259–273. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Blackwell, D., Dubins, L.: Merging of opinions with increasing information. Annals of Mathematical Statistics 33, 882–887 (1962)

    Article  MathSciNet  Google Scholar 

  3. Clarke, B.S., Barron, A.R.: Information-theoretic asymptotics of Bayes methods. IEEE Trans. Inform. Theory 36, 453–471 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  4. Barron, A.R., Rissanen, J.J., Yu, B.: The minimum description length principle in coding and modeling. IEEE Trans. Inform. Theory 44, 2743–2760 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  5. Wallace, C.S., Boulton, D.M.: An information measure for classification. Computer Journal 11, 185–194 (1968)

    MATH  Google Scholar 

  6. Wallace, C.S., Dowe, D.L.: Minimum Message Length and Kolmogorov Complexity. Computer Journal 42, 270–283 (1999)

    Article  MATH  Google Scholar 

  7. Rissanen, J.J.: Fisher Information and Stochastic Complexity. IEEE Trans. Inform. Theory 42, 40–47 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  8. Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Trans. Inform. Theory 37, 1034–1054 (1991)

    Article  MathSciNet  Google Scholar 

  9. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)

    Google Scholar 

  10. Poland, J., Hutter, M.: Asymptotics of discrete MDL for online prediction. IEEE Transactions on Information Theory 51, 3780–3795 (2005)

    Article  MathSciNet  Google Scholar 

  11. Grünwald, P., Langford, J.: Suboptimal behaviour of Bayes and MDL in classification under misspecification. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 331–347. Springer, Heidelberg (2004)

    Google Scholar 

  12. Comley, J.W., Dowe, D.L.: Minimum message length and generalized Bayesian nets with asymmetric languages. In: Grünwald, P., Myung, I.J., Pitt, M.A. (eds.) Advances in Minimum Description Length: Theory and Applications, pp. 265–294 (2005)

    Google Scholar 

  13. Solomonoff, R.J.: Complexity-based induction systems: comparisons and convergence theorems. IEEE Trans. Inform. Theory 24, 422–432 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  14. Poland, J., Hutter, M.: MDL convergence speed for Bernoulli sequences. Statistics and Computing 16, 161–175 (2006)

    Article  MathSciNet  Google Scholar 

  15. Borovkov, A.A., Moullagaliev, A.: Mathematical Statistics. Gordon & Breach, Newark (1998)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Wolfgang Thomas Pascal Weil

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Poland, J. (2007). On the Consistency of Discrete Bayesian Learning. In: Thomas, W., Weil, P. (eds) STACS 2007. STACS 2007. Lecture Notes in Computer Science, vol 4393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70918-3_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70918-3_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70917-6

  • Online ISBN: 978-3-540-70918-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics