Skip to main content

Temporal Memory Sharing in Visual Reinforcement Learning

  • Chapter
  • First Online:
Genetic Programming Theory and Practice XVII

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

Video games provide a well-defined study ground for the development of behavioural agents that learn through trial-and-error interaction with their environment, or reinforcement learning (RL). They cover a diverse range of environments that are designed to be challenging for humans, all through a high-dimensional visual interface. Tangled Program Graphs (TPG) is a recently proposed genetic programming algorithm that emphasizes emergent modularity (i.e. automatic construction of multi-agent organisms) in order to build successful RL agents more efficiently than state-of-the-art solutions from other sub-fields of artificial intelligence, e.g. deep neural networks. However, TPG organisms represent a direct mapping from input to output with no mechanism to integrate past experience (previous inputs). This is a limitation in environments with partial observability. For example, TPG performed poorly in video games that explicitly require the player to predict the trajectory of a moving object. In order to make these calculations, players must identify, store, and reuse important parts of past experience. In this work, we describe an approach to supporting this type of short-term temporal memory in TPG, and show that shared memory among subsets of agents within the same organism seems particularly important. In addition, we introduce heterogeneous TPG organisms composed of agents with distinct types of representation that collaborate through shared memory. In this study, heterogeneous organisms provide a parsimonious approach to supporting agents with task-specific functionality, image processing capabilities in the case of this work. Taken together, these extensions allow TPG to discover high-scoring behaviours for the Atari game Breakout, which is an environment it failed to make significant progress on previously.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This screen resolution corresponds to 40% of the raw Atari screen resolution. TPG has previously been shown to operate under the full Atari screen resolution [21]. The focus of this study is temporal memory, and the down sampling is used here to speed up empirical evaluations.

  2. 2.

    An additional 10 runs were conducted for this analysis relative to the 10 runs summarized in Fig. 6.4a.

References

  1. A. Simon, H.: The architecture of complexity. Proceedings of the American Philosophical Society 106, 467–482 (1962)

    Google Scholar 

  2. Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: G. Squillero, P. Burelli (eds.) Applications of Evolutionary Computation, pp. 19–34. Springer International Publishing (2016)

    Google Scholar 

  3. Atkins, D., Neshatian, K., Zhang, M.: A domain independent genetic programming approach to automatic feature extraction for image classification. In: 2011 IEEE Congress of Evolutionary Computation (CEC), pp. 238–245 (2011)

    Google Scholar 

  4. Beattie, C., Leibo, J.Z., Teplyashin, D., Ward, T., Wainwright, M., Küttler, H., Lefrancq, A., Green, S., Valdés, V., Sadik, A., Schrittwieser, J., Anderson, K., York, S., Cant, M., Cain, A., Bolton, A., Gaffney, S., King, H., Hassabis, D., Legg, S., Petersen, S.: Deepmind lab. arXiv preprint arXiv:1612.03801 (2016)

    Google Scholar 

  5. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279 (2013)

    Article  Google Scholar 

  6. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag (2006)

    Google Scholar 

  7. Brameier, M., Banzhaf, W.: Linear Genetic Programming, 1st edn. Springer (2007)

    Google Scholar 

  8. Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the 1st Annual Conference on Genetic Programming, pp. 261–266. MIT Press (1996)

    Google Scholar 

  9. Choi, S.P.M., Yeung, D.Y., Zhang, N.L.: An environment model for nonstationary reinforcement learning. In: S.A. Solla, T.K. Leen, K. Müller (eds.) Advances in Neural Information Processing Systems 12, pp. 987–993. MIT Press (2000)

    Google Scholar 

  10. Conrads, M., Nordin, P., Banzhaf, W.: Speech sound discrimination with genetic programming. In: W. Banzhaf, R. Poli, M. Schoenauer, T.C. Fogarty (eds.) Genetic Programming, pp. 113–129. Springer Berlin Heidelberg (1998)

    Chapter  Google Scholar 

  11. Davis, R.L., Zhong, Y.: The Biology of Forgetting – A Perspective. Neuron 95(3), 490–503 (2017)

    Article  Google Scholar 

  12. Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural turing machines for reward-based learning. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO ’16, pp. 117–124. ACM (2016)

    Google Scholar 

  13. Hasselt, H.v., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 2094–2100. AAAI Press (2016)

    Google Scholar 

  14. Hausknecht, M., Lehman, J., Miikkulainen, R., Stone, P.: A neuroevolution approach to general Atari game playing. IEEE Transactions on Computational Intelligence and AI in Games 6(4), 355–366 (2014)

    Article  Google Scholar 

  15. Haynes, T.D., Wainwright, R.L.: A simulation of adaptive agents in a hostile environment. In: Proceedings of the 1995 ACM Symposium on Applied Computing, SAC ’95, pp. 318–323. ACM (1995)

    Google Scholar 

  16. Hintze, A., Edlund, J.A., Olson, R.S., Knoester, D.B., Schossau, J., Albantakis, L., Tehrani-Saleh, A., Kvam, P.D., Sheneman, L., Goldsby, H., Bohm, C., Adami, C.: Markov brains: A technical introduction. arXiv preprint 1709.05601 (2017)

    Google Scholar 

  17. Hintze, A., Schossau, J., Bohm, C.: The evolutionary buffet method. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice XVI, Genetic and Evolutionary Computation Series, pp. 17–36. Springer (2018)

    Google Scholar 

  18. Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A.G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science 364(6443), 859–865 (2019)

    Article  MathSciNet  Google Scholar 

  19. Kelly, S.: Scaling genetic programming to challenging reinforcement tasks through emergent modularity. Ph.D. thesis, Faculty of Computer Science, Dalhousie University (2018)

    Google Scholar 

  20. Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evolutionary Computation 26(3), 347–380 (2018)

    Article  Google Scholar 

  21. Kelly, S., Smith, R.J., Heywood, M.I.: Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial, pp. 37–57. Springer International Publishing (2019)

    Google Scholar 

  22. Kober, J., Peters, J.: Reinforcement learning in robotics: A survey. In: M. Wiering, M. van Otterio (eds.) Reinforcement Learning, pp. 579–610. Springer (2012)

    Google Scholar 

  23. Koza, J.R., Andre, D., Bennett, F.H., Keane, M.A.: Genetic Programming III: Darwinian Invention & Problem Solving, 1st edn. Morgan Kaufmann Publishers Inc. (1999)

    Google Scholar 

  24. Krawiec, K., Bhanu, B.: Visual learning by coevolutionary feature synthesis. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 35(3), 409–425 (2005)

    Google Scholar 

  25. Lalejini, A., Ofria, C.: What Else Is in an Evolved Name? Exploring Evolvable Specificity with SignalGP. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice XVI, pp. 103–121. Springer International Publishing (2019)

    Google Scholar 

  26. Lughofer, E., Sayed-Mouchaweh, M.: Adaptive and on-line learning in non-stationary environments. Evolving Systems 6(2), 75–77 (2015)

    Article  Google Scholar 

  27. Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M., Bowling, M.: Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. J. Artif. Int. Res. 61(1), 523–562 (2018)

    MathSciNet  MATH  Google Scholar 

  28. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: M.F. Balcan, K.Q. Weinberger (eds.) Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937. PMLR (2016)

    Google Scholar 

  29. Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, pp. 2204–2212. MIT Press (2014)

    Google Scholar 

  30. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  31. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference on Learning Representations (2016)

    Google Scholar 

  32. Smith, R.J., Heywood, M.I.: A model of external memory for navigation in partially observable visual reinforcement learning tasks. In: L. Sekanina, T. Hu, N. Lourenço, H. Richter, P. García-Sánchez (eds.) Genetic Programming, pp. 162–177. Springer International Publishing (2019)

    Google Scholar 

  33. Stanley, K.O., Miikkulainen, R.: Evolving a Roving Eye for Go. In: T. Kanade, J. Kittler, J.M. Kleinberg, F. Mattern, J.C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M.Y. Vardi, G. Weikum, K. Deb (eds.) Genetic and Evolutionary Computation — GECCO 2004, vol. 3103, pp. 1226–1238. Springer Berlin Heidelberg, Berlin, Heidelberg (2004)

    Google Scholar 

  34. Sutton, R.R., Barto, A.G.: Reinforcement Learning: An introduction. MIT Press (1998)

    Google Scholar 

  35. Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence, vol. 1, pp. 136–141 (1994)

    Google Scholar 

  36. Wagner, G.P., Altenberg, L.: Perspective: Complex adaptations and the evolution of evolvability. Evolution 50(3), 967–976 (1996)

    Article  Google Scholar 

  37. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pp. 1995–2003. JMLR.org (2016)

    Google Scholar 

  38. Watson, R.A., Pollack, J.B.: Modular interdependency in complex dynamical systems. Artificial Life 11(4), 445–457 (2005)

    Article  Google Scholar 

  39. Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing atari games. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’18, pp. 229–236. ACM (2018)

    Google Scholar 

Download references

Acknowledgements

Stephen Kelly gratefully acknowledges support from the NSERC Postdoctoral Fellowship program. Computational resources for this research were provided by Michigan State University through the Institute for Cyber-Enabled Research (https://icer.msu.edu) and Compute Canada (https://computecanada.ca).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen Kelly .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kelly, S., Banzhaf, W. (2020). Temporal Memory Sharing in Visual Reinforcement Learning. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds) Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-39958-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-39958-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-39957-3

  • Online ISBN: 978-3-030-39958-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics