Gradient Based Algorithms with Loss Functions and Kernels for Improved On-Policy Control

Robards, Matthew; Sunehag, Peter

doi:10.1007/978-3-642-29946-9_7

Matthew Robards²¹ &
Peter Sunehag²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7188))

Included in the following conference series:

European Workshop on Reinforcement Learning

2209 Accesses

Abstract

We introduce and empirically evaluate two novel online gradient-based reinforcement learning algorithms with function approximation – one model based, and the other model free. These algorithms come with the possibility of having non-squared loss functions which is novel in reinforcement learning, and seems to come with empirical advantages. We further extend a previous gradient based algorithm to the case of full control, by using generalized policy iteration. Theoretical properties of these algorithms are studied in a companion paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baird, L., Moore, A.: Gradient descent for general reinforcement learning. In: Neural Information Processing Systems, vol. 11, pp. 968–974. MIT Press (1998)
Google Scholar
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: 22nd International Conference on Machine Learning (ICML 2005), Bonn, Germany, pp. 201–208 (2005)
Google Scholar
Engel, Y., Mannor, S., Meir, R.: Bayes meets bellman: The gaussian process approach to temporal difference learning. In: Proc. of the 20th International Conference on Machine Learning, pp. 154–161 (2003)
Google Scholar
Maei, H., Szepesvri, C., Bhatnagar, S., Sutton, R.: Toward off-policy learning control with function approximation. In: Proceedings of the 27th International Conference on Machine Learning (2010)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
MATH Google Scholar
Robards, M., Sunehag, P.: Online convex reinforcement learning. In: Submitted to 9th EWRL (2011)
Google Scholar
Robards, M., Sunehag, P., Sanner, S., Marthi, B.: Sparse Kernel-SARSA(λ) with an Eligibility Trace. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS, vol. 6913, pp. 1–17. Springer, Heidelberg (2011)
Chapter Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning. The MIT Press (1998)
Google Scholar
Sutton, R., Maei, H., Precup, D., Bhatnagar, S., Silver, D., Szepesvri, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning (2009)
Google Scholar
Sutton, R., Szepesvári, C., Maei, H.: A convergent o(n) temporal-difference algorithm for off-policy learning with linear function approximation. In: NIPS, pp. 1609–1616. MIT Press (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Australian National University, Nicta, Australia
Matthew Robards & Peter Sunehag

Authors

Matthew Robards
View author publications
You can also search for this author in PubMed Google Scholar
Peter Sunehag
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA and the Australian National University, 7 London Circuit, ACT 2601, Canberra, Australia
Scott Sanner
Research School of Computer Science, Australian National University, ACT 0200, Canberra, Australia
Marcus Hutter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Robards, M., Sunehag, P. (2012). Gradient Based Algorithms with Loss Functions and Kernels for Improved On-Policy Control. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-29946-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29945-2
Online ISBN: 978-3-642-29946-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics