Intrinsically Motivated Lifelong Exploration in Reinforcement Learning

Bougie, Nicolas; Ichise, Ryutaro

doi:10.1007/978-3-030-73113-7_10

Nicolas Bougie^23,24 &
Ryutaro Ichise^23,24

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1357))

Included in the following conference series:

Annual Conference of the Japanese Society for Artificial Intelligence

381 Accesses

Abstract

Long-term horizon exploration remains a challenging problem in deep reinforcement learning, especially when an environment contains sparse or poorly-defined extrinsic rewards. To tackle this challenge, we propose a reinforcement learning agent to solve hard exploration tasks by leveraging a lifelong exploration bonus. Our method decomposes this bonus into a short-term and a long-term intrinsic reward. The former deals with local exploration - exploring the consequences of short-term decisions, while the latter explicitly encourages deep exploration strategies by remaining large throughout the training process. As formulation of intrinsic novelty, we propose to measure the reconstruction error of an observation given its context to capture flexible exploration behaviors characterized by different time horizons. We demonstrate the effectiveness of our approach in visually rich environments in Minigrid, DMLab, and Atari games. Experimental results validate that our method outperforms baselines in most tasks in terms of scores and exploration efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abel, D., Agarwal, A., Diaz, F., Krishnamurthy, A., Schapire, R.E.: Exploratory gradient boosting for reinforcement learning in complex domains. In: ICML Workshop on Abstraction in Reinforcement Learning (2016)
Google Scholar
Badia, A.P., et al.: Never give up: learning directed exploration strategies. In: International Conference on Learning Representations (2020)
Google Scholar
Beattie, C., et al.: DeepMind lab. arXiv preprint arXiv:1612.03801 (2016)
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems, pp. 1471–1479 (2016)
Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Article Google Scholar
Bougie, N., Ichise, R.: Skill-based curiosity for intrinsically motivated reinforcement learning. Mach. Learn. 109, 493–512 (2019)
Article MathSciNet Google Scholar
Bougie, N., Ichise, R.: Combining local and global exploration via intrinsic rewards. In: Annual Conference of JSAI, pp. 2K6ES205–2K6ES205 (2020). https://doi.org/10.11517/pjsai.JSAI2020.0_2K6ES205
Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. In: Conference on Learning Representations (2019)
Google Scholar
Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for OpenAI gym (2018). https://github.com/maximecb/gym-minigrid
Fu, J., Co-Reyes, J., Levine, S.: Ex2: exploration with exemplar models for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2577–2587 (2017)
Google Scholar
Han, D.: Comparison of commonly used image interpolation methods. In: Conference on Computer Science and Electronics Engineering (2013)
Google Scholar
Houthooft, R., et al.: VIME: variational information maximizing exploration. In: Advances in Neural Information Processing Systems, pp. 1109–1117 (2016)
Google Scholar
Klyubin, A.S., Polani, D., Nehaniv, C.L.: Empowerment: a universal agent-centric measure of control. In: IEEE Congress on Evolutionary Computation, vol. 1, pp. 128–135 (2005)
Google Scholar
Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evol. Comput. 19, 189–223 (2011)
Article Google Scholar
Machado, M.C., Bellemare, M.G., Bowling, M.: Count-based exploration with the successor representation. In: AAAI Conference on Artificial Intelligence, pp. 5125–5133 (2020)
Google Scholar
Martin, J., Sasikumar, S.N., Everitt, T., Hutter, M.: Count-based exploration in feature space for reinforcement learning. In: International Joint Conference on Artificial Intelligence (2017)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Conference on Machine Learning, pp. 1928–1937 (2016)
Google Scholar
Ostrovski, G., Bellemare, M.G., van den Oord, A., Munos, R.: Count-based exploration with neural density models. In: International Conference on Machine Learning, pp. 2721–2730 (2017)
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on International Conference on Machine Learning (2017)
Google Scholar
Raileanu, R., Rocktäschel, T.: Ride: rewarding impact-driven exploration for procedurally-generated environments. In: International Conference on Learning Representations (2020)
Google Scholar
Savinov, N., et al.: Episodic curiosity through reachability. In: International Conference on Learning Representations (2019)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for Markov decision processes. J. Comput. Syst. Sci. 74(8), 1309–1331 (2008)
Article MathSciNet Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
MATH Google Scholar
Tang, H., et al.: # exploration: a study of count-based exploration for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2753–2762 (2017)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning (2016)
Google Scholar
Yang, H.K., Chiang, P.H., Hong, M.F., Lee, C.Y.: Flow-based intrinsic curiosity module. In: Proceedings of the the International Conference on Learning Representations (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

The Graduate University for Advanced Studies, Sokendai, Tokyo, Japan
Nicolas Bougie & Ryutaro Ichise
National Institute of Informatics, Tokyo, Japan
Nicolas Bougie & Ryutaro Ichise

Authors

Nicolas Bougie
View author publications
You can also search for this author in PubMed Google Scholar
Ryutaro Ichise
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Bougie .

Editor information

Editors and Affiliations

Kansai University, Suita, Osaka, Japan
Katsutoshi Yada
Department of Applied Computer Science, Tokyo Polytechnic University, Atsugi, Kanagawa, Japan
Daisuke Katagami
Graduate School of System Design, Tokyo Metropolitan University, Hino, Tokyo, Japan
Yasufumi Takama
Department of Social Informatics, Kyoto University, Kyoto, Japan
Takayuki Ito
Division of Behavioral Science, Faculty of Letters, Chiba University, Chiba, Chiba, Japan
Akinori Abe
Department of Computer Science, Graduate School of System Design, Tokyo Metropolitan University, Hino, Tokyo, Japan
Eri Sato-Shimokawara
Mathematics and Informatics Center, The University of Tokyo, Tokyo, Japan
Junichiro Mori
Graduate School of Economics, Osaka University, Toyonaka, Osaka, Japan
Naohiro Matsumura
Department of Intelligence Science and Technology, Kyoto University, Kyoto, Japan
Hisashi Kashima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bougie, N., Ichise, R. (2021). Intrinsically Motivated Lifelong Exploration in Reinforcement Learning. In: Yada, K., et al. Advances in Artificial Intelligence. JSAI 2020. Advances in Intelligent Systems and Computing, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-73113-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-73113-7_10
Published: 23 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73112-0
Online ISBN: 978-3-030-73113-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics