Skip to main content

Intrinsically Motivated Lifelong Exploration in Reinforcement Learning

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (JSAI 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1357))

Included in the following conference series:

  • 381 Accesses

Abstract

Long-term horizon exploration remains a challenging problem in deep reinforcement learning, especially when an environment contains sparse or poorly-defined extrinsic rewards. To tackle this challenge, we propose a reinforcement learning agent to solve hard exploration tasks by leveraging a lifelong exploration bonus. Our method decomposes this bonus into a short-term and a long-term intrinsic reward. The former deals with local exploration - exploring the consequences of short-term decisions, while the latter explicitly encourages deep exploration strategies by remaining large throughout the training process. As formulation of intrinsic novelty, we propose to measure the reconstruction error of an observation given its context to capture flexible exploration behaviors characterized by different time horizons. We demonstrate the effectiveness of our approach in visually rich environments in Minigrid, DMLab, and Atari games. Experimental results validate that our method outperforms baselines in most tasks in terms of scores and exploration efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abel, D., Agarwal, A., Diaz, F., Krishnamurthy, A., Schapire, R.E.: Exploratory gradient boosting for reinforcement learning in complex domains. In: ICML Workshop on Abstraction in Reinforcement Learning (2016)

    Google Scholar 

  2. Badia, A.P., et al.: Never give up: learning directed exploration strategies. In: International Conference on Learning Representations (2020)

    Google Scholar 

  3. Beattie, C., et al.: DeepMind lab. arXiv preprint arXiv:1612.03801 (2016)

  4. Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems, pp. 1471–1479 (2016)

    Google Scholar 

  5. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Article  Google Scholar 

  6. Bougie, N., Ichise, R.: Skill-based curiosity for intrinsically motivated reinforcement learning. Mach. Learn. 109, 493–512 (2019)

    Article  MathSciNet  Google Scholar 

  7. Bougie, N., Ichise, R.: Combining local and global exploration via intrinsic rewards. In: Annual Conference of JSAI, pp. 2K6ES205–2K6ES205 (2020). https://doi.org/10.11517/pjsai.JSAI2020.0_2K6ES205

  8. Burda, Y., Edwards, H., Storkey, A., Klimov, O.: Exploration by random network distillation. In: Conference on Learning Representations (2019)

    Google Scholar 

  9. Chevalier-Boisvert, M., Willems, L., Pal, S.: Minimalistic gridworld environment for OpenAI gym (2018). https://github.com/maximecb/gym-minigrid

  10. Fu, J., Co-Reyes, J., Levine, S.: Ex2: exploration with exemplar models for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2577–2587 (2017)

    Google Scholar 

  11. Han, D.: Comparison of commonly used image interpolation methods. In: Conference on Computer Science and Electronics Engineering (2013)

    Google Scholar 

  12. Houthooft, R., et al.: VIME: variational information maximizing exploration. In: Advances in Neural Information Processing Systems, pp. 1109–1117 (2016)

    Google Scholar 

  13. Klyubin, A.S., Polani, D., Nehaniv, C.L.: Empowerment: a universal agent-centric measure of control. In: IEEE Congress on Evolutionary Computation, vol. 1, pp. 128–135 (2005)

    Google Scholar 

  14. Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evol. Comput. 19, 189–223 (2011)

    Article  Google Scholar 

  15. Machado, M.C., Bellemare, M.G., Bowling, M.: Count-based exploration with the successor representation. In: AAAI Conference on Artificial Intelligence, pp. 5125–5133 (2020)

    Google Scholar 

  16. Martin, J., Sasikumar, S.N., Everitt, T., Hutter, M.: Count-based exploration in feature space for reinforcement learning. In: International Joint Conference on Artificial Intelligence (2017)

    Google Scholar 

  17. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Conference on Machine Learning, pp. 1928–1937 (2016)

    Google Scholar 

  18. Ostrovski, G., Bellemare, M.G., van den Oord, A., Munos, R.: Count-based exploration with neural density models. In: International Conference on Machine Learning, pp. 2721–2730 (2017)

    Google Scholar 

  19. Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on International Conference on Machine Learning (2017)

    Google Scholar 

  20. Raileanu, R., Rocktäschel, T.: Ride: rewarding impact-driven exploration for procedurally-generated environments. In: International Conference on Learning Representations (2020)

    Google Scholar 

  21. Savinov, N., et al.: Episodic curiosity through reachability. In: International Conference on Learning Representations (2019)

    Google Scholar 

  22. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  23. Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for Markov decision processes. J. Comput. Syst. Sci. 74(8), 1309–1331 (2008)

    Article  MathSciNet  Google Scholar 

  24. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  25. Tang, H., et al.: # exploration: a study of count-based exploration for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2753–2762 (2017)

    Google Scholar 

  26. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  27. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning (2016)

    Google Scholar 

  28. Yang, H.K., Chiang, P.H., Hong, M.F., Lee, C.Y.: Flow-based intrinsic curiosity module. In: Proceedings of the the International Conference on Learning Representations (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Bougie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bougie, N., Ichise, R. (2021). Intrinsically Motivated Lifelong Exploration in Reinforcement Learning. In: Yada, K., et al. Advances in Artificial Intelligence. JSAI 2020. Advances in Intelligent Systems and Computing, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-73113-7_10

Download citation

Publish with us

Policies and ethics