Skip to main content

Knowledge Gradient for Online Reinforcement Learning

  • Conference paper
  • First Online:
Agents and Artificial Intelligence (ICAART 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8946))

Included in the following conference series:

  • 739 Accesses

Abstract

The most interesting challenge for a reinforcement learning agent is to learn online in unknown large discrete, or continuous stochastic model. The agent has not only to trade-off between exploration and exploitation, but also has to find a good set of basis functions to approximate the value function. We extend offline kernel-based LSPI (or least squares policy iteration) to online learning. Online kernel-based LSPI combines feature of offline kernel-based LSPI and online LSPI. Online kernel-based LSPI uses knowledge gradient policy as an exploration policy to trade-off between exploration and exploitation, and the approximate linear dependency based kernel sparsification method to select basis functions automatically. We compare between online kernel-based LSPI and online LSPI on 5 discrete Markov decision problems, where online kernel-based LSPI outperforms online LSPI according to the optimal policy performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In order not to overload the notation we omit the time step t when it does not cause confusion.

  2. 2.

    Note that, [8] used a variant of the KG policy. [8] used the RMSE \(\hat{\bar{\sigma }}\) instead of the change in the RMSE \(\widetilde{\sigma }\) to calculate the KG index \(V^{KG}\) to get better trade-off between exploration and exploitation.

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  2. Lagoudakis, M.G., Parr, R.: Model-free least squares policy iteration. Technical report, Computer Science Department, Duke University, Durham, North Carolina, United States (2003)

    Google Scholar 

  3. Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. J. IEEE Trans. Neural Netw. 18(4), 973–992 (2007)

    Article  Google Scholar 

  4. Vapnik, V.: The Grid: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  5. Engel, Y., Mannor, S., Meir, R.: The kernel recursive least-squares algorithm. J. IEEE Trans. Signal Process. 52(8), 2275–2285 (2004)

    Article  MathSciNet  Google Scholar 

  6. Buşoniu, L., Ernst, D., De Schutter, B., Babuška, R.: Online least-squares policy iteration for reinforcement learning control. In: American Control Conference (ACC), pp. 486–491 (2010)

    Google Scholar 

  7. Li, L., Littman, M.L., Mansley, C.R.: Online exploration in least-squares policy iteration. J. Comput. (2008)

    Google Scholar 

  8. Yahyaa, S., Manderick, B.: Knowledge gradient exploration in online least squares policy iteration. In: 5th International Conference on Agents and Artificial Intelligence (ICAART). Springer-Verlag, Barcelona (2013)

    Google Scholar 

  9. Ryzhov, I.O., Powell, W.B., Frazier, P.I.: The knowledge-gradient policy for a general class of online learning problems. J. Oper. Res. 60, 180–195 (2011)

    Article  MATH  Google Scholar 

  10. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)

    Book  MATH  Google Scholar 

  11. Powell, W.B., Ryzhov, I.O.: Optimal Learning. Willey, Canada (2012)

    Book  Google Scholar 

  12. Engel, Y., Meir, R.: Algorithms and representations for reinforcement learning. Technical report, Computer Science Department, Senate of the Hebrew (2005)

    Google Scholar 

  13. Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: 22nd International Conference on Machine learning (ICML), New York (2005)

    Google Scholar 

  14. Koller, D., Parr, R.: Policy iteration for factored MDPs. In: 16th Annual Conference on Uncertainty in Artificial Intelligence American Control Conference (UAI 2000) (2000)

    Google Scholar 

  15. Mahadevan, S.: Representation Discovery Using Harmonic Analysis. Morgan and Claypool Publishers, San Rafael (2008)

    MATH  Google Scholar 

  16. Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)

    Google Scholar 

  17. Sugiyama, M., Hachiya, H., Towell, C., Vijayakumar, S.: Geodesic Gaussian kernels for value function approximation. J. Auton. Robots 25(3), 287–304 (2008)

    Article  Google Scholar 

  18. Yahyaa, S., Manderick, B.: Shortest path Gaussian kernels for state action graphs: an empirical study. In: 24th Benelux Conference on Artificial Intelligence (BNAIC). Maastricht University, The Netherlands (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saba Yahyaa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yahyaa, S., Manderick, B. (2015). Knowledge Gradient for Online Reinforcement Learning. In: Duval, B., van den Herik, J., Loiseau, S., Filipe, J. (eds) Agents and Artificial Intelligence. ICAART 2014. Lecture Notes in Computer Science(), vol 8946. Springer, Cham. https://doi.org/10.1007/978-3-319-25210-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25210-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25209-4

  • Online ISBN: 978-3-319-25210-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics