Skip to main content

Error Bounds in Reinforcement Learning Policy Evaluation

  • Conference paper
Advances in Artificial Intelligence (Canadian AI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3501))

  • 1197 Accesses

Abstract

With the advent of Kearns & Singh’s (2000) rigorous upper bound on the error of temporal difference estimators, we derive the first rigorous error bound for the maximum likelihood policy evaluation method as well as deriving a Monte Carlo matrix inversion policy evaluation error bound. We provide, the first direct comparison between the error bounds of the maximum likelihood (ML), Monte Carlo matrix inversion (MCMI) and temporal difference (TD) estimation methods for policy evaluation. We use these bounds to confirm generally held notions of the superior accuracy of the model-based estimation methods of ML and MCMI over the model-free method of TD. With our error bounds, we are also able to specify parameters and conditions that affect each method’s estimation accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sutton, R.S.: Learning to predict by the method of Temporal Differences. Machine Learning 3, 9–44 (1988)

    Google Scholar 

  2. Singh, S.P., Sutton, R.S.: Reinforcement Learning with Replacing Eligibility Traces. Machine Learning 22, 123–158 (1996)

    MATH  Google Scholar 

  3. Barto, A.G., Duff, M.: Monte Carlo matrix inversion and reinforcement learning. In: NIPS: Proceedings of the 1994 Conference, pp. 687–694. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  4. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, Massachusetts (1998)

    Google Scholar 

  5. Lu, F., Patrascu, R., Schuurmans, D.: Investigating the Maximum Likelihood alternative to TD(λ). In: Proceedings of the 19th ICML, pp. 403–410. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  6. Lu, F., Schuurmans, D.: Monte Carlo Matrix Inversion Policy Evaluation. In: UAI: Proceedings of the 19th Conference, pp. 386–393. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  7. Kearns, M., Singh, S.: Bias-variance error bounds for temporal difference updates. In: Proceedings of the 13th Annual Conference on Computational Learning Theory, pp. 142–147 (2000)

    Google Scholar 

  8. Forsythe, G.E., Leibler, R.A.: Matrix inversion by a Monte Carlo Method. MTAC 4, 127–129 (1950)

    MathSciNet  Google Scholar 

  9. Kearns, M., Singh, S.: Finite-sample convergence rates for q-learning and indirect algorithms. In: NIPS: Proceedings of the 1998 Conference, pp. 996–1002 (1998)

    Google Scholar 

  10. Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (1989)

    MATH  Google Scholar 

  11. Singh, S.P., Dayan, P.: Analytical Mean Squared Error Curves for Temporal Difference Learning. Machine Learning 32, 5–40 (1998)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lu, F. (2005). Error Bounds in Reinforcement Learning Policy Evaluation. In: KĂ©gl, B., Lapalme, G. (eds) Advances in Artificial Intelligence. Canadian AI 2005. Lecture Notes in Computer Science(), vol 3501. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424918_48

Download citation

  • DOI: https://doi.org/10.1007/11424918_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25864-3

  • Online ISBN: 978-3-540-31952-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics