Feature Reinforcement Learning in Practice

Nguyen, Phuong; Sunehag, Peter; Hutter, Marcus

doi:10.1007/978-3-642-29946-9_10

Phuong Nguyen^21,22,
Peter Sunehag²¹ &
Marcus Hutter^21,22,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7188))

Included in the following conference series:

European Workshop on Reinforcement Learning

2264 Accesses
5 Citations

Abstract

Following a recent surge in using history-based methods for resolving perceptual aliasing in reinforcement learning, we introduce an algorithm based on the feature reinforcement learning framework called ΦMDP [13]. To create a practical algorithm we devise a stochastic search procedure for a class of context trees based on parallel tempering and a specialized proposal distribution. We provide the first empirical evaluation for ΦMDP. Our proposed algorithm achieves superior performance to the classical U-tree algorithm [20] and the recent active-LZ algorithm [6], and is competitive with MC-AIXI-CTW [29] that maintains a bayesian mixture over all context trees up to a chosen depth. We are encouraged by our ability to compete with this sophisticated method using an algorithm that simply picks one single model, and uses Q-learning on the corresponding MDP. Our ΦMDP algorithm is simpler and consumes less time and memory. These results show promise for our future work on attacking more complex and larger problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Akaike, H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–723 (1974)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Anthena Scientific, Belmont (1996)
MATH Google Scholar
Brafman, R.I., Tennenholz, M.: R-max -a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learing Research 3, 213–231 (2002)
Google Scholar
Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: AAAI, pp. 183–188 (1992)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Willey and Sons (1991)
Google Scholar
Farias, V., Moallemi, C., Van Roy, B., Weissman, T.: Universal reinforcement learning. IEEE Transactions on Information Theory 56(5), 2441–2454 (2010)
Article Google Scholar
Geyer, C.J.: Markov chain Monte Calro maximum likelihood. In: Computing Science and Statistics: the 23rd Symposium on the Interface, pp. 156–163. Interface Foundation, Fairfax (1991)
Google Scholar
Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov decision process. Artificial Intelligence 147, 163–223 (2003)
Article MathSciNet MATH Google Scholar
Granville, V., Křivánek, M., Rasson, J.P.: Simulated annealing: A proof of convergence. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(6), 652–656 (1994)
Article Google Scholar
Grünwald, P.D.: The Minimum Description Length Principle. The MIT Press (2007)
Google Scholar
Hukushima, K., Nemoto, K.: Exchange Monte Carlo method and application to spin glass simulations. Journal of the Physical Socieity of Japan 65(4), 1604–1608 (1996)
Article Google Scholar
Hutter, M.: Universal Articial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)
Google Scholar
Hutter, M.: Feature reinforcement learning: Part I. Unstructured MDPs. Journal of General Artificial Intelligence (2009)
Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in paritally observable stochastic domains. Artifical Intelligence 101, 99–134 (1998)
Article MathSciNet MATH Google Scholar
Kocsis, L., Szepesvári, C.: Bandit Based Monte-Carlo Planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Chapter Google Scholar
Li, L., Walsh, T.J., Littmans, M.L.: Towards a unified theory of state abstraction for MDPs. In: Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics (2006)
Google Scholar
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, Heidelberg (2001)
MATH Google Scholar
Madani, O., Handks, S., Condon: On the undecidability of probabilistic planning and related stochastic optimization problems. Artifical Intelligence 147, 5–34 (2003)
Article MATH Google Scholar
Mahmud, M.M.H.: Constructing states for reinforcement learning. In: Fürnkranz, J., Joachims, T. (eds.) Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel, pp. 727–734 (June 2010), http://www.icml2010.org/papers/593.pdf
McCallum, A.K.: Reinforcement Learning with Selective Perception and Hidden State. Ph.D. thesis, Department of Computer Science, University of Rochester (1996)
Google Scholar
Nguyen, P., Sunehag, P., Hutter, M.: Feature refinrocement learning in practice. Tech. rep., Australian National University (2011)
Google Scholar
Poland, J., Hutter, M.: Universal learning of repeated matrix games. In: Proc. 15th Annual Machine Learning Conf. of Belgium and The Netherlands (Benelearn 2006), pp. 7–14. Ghent (2006), http://arxiv.org/abs/cs.LG/0508073
Rissanen, J.: A universal data compression system. IEEE Transactions on Information Theory 29(5), 656–663 (1983)
Article MathSciNet MATH Google Scholar
Schneider, J., Kirkpatrick, S.: Stochastic Optimization, 1st edn. Springer, Heidelberg (2006)
MATH Google Scholar
Singh, S.P., James, M.R., Rudary, M.R.: Predictive state representations: A new theory for modeling dynamical systems. In: Proceedings of the 20th Conference in Uncertainty in Artificial Intelligence, Banff, Canada, pp. 512–518 (2004)
Google Scholar
Suman, B., Kumar, P.: A survey of simulated annealing as a tool for single and multiobjecctive optimization. Journal of the Operational Research Society 57, 1143–1160 (2006)
Article MATH Google Scholar
Sunehag, P., Hutter, M.: Consistency of Feature Markov Processes. In: Hutter, M., Stephan, F., Vovk, V., Zeugmann, T. (eds.) ALT 2010. LNCS(LNAI), vol. 6331, pp. 360–374. Springer, Heidelberg (2010)
Chapter Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning. The MIT Press (1998)
Google Scholar
Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte-Carlo AIXI approximation. Journal of Artifiicial Intelligence Research 40(1), 95–142 (2011)
MathSciNet MATH Google Scholar
Vidal, E., Thollard, F., Higuera, C.D.L., Casacuberta, F., Carrasco, R.C.: Probabilitic finite-state machines. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1013–1025 (2005)
Article Google Scholar
Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Springer, Berlin (2005)
MATH Google Scholar
Wilems, F.M.J., Shtarkov, Y.M., Tjalkens, T.J.: The context tree weighting method: Basic properties. IEEE Transactions on Information Theory 41, 653–664 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Australian National University, Australia
Phuong Nguyen, Peter Sunehag & Marcus Hutter
NICTA, Australia
Phuong Nguyen & Marcus Hutter
ETHZ, Switzerland
Marcus Hutter

Authors

Phuong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Peter Sunehag
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Hutter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA and the Australian National University, 7 London Circuit, ACT 2601, Canberra, Australia
Scott Sanner
Research School of Computer Science, Australian National University, ACT 0200, Canberra, Australia
Marcus Hutter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, P., Sunehag, P., Hutter, M. (2012). Feature Reinforcement Learning in Practice. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-29946-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29945-2
Online ISBN: 978-3-642-29946-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics