Temporal Memory Sharing in Visual Reinforcement Learning

Kelly, Stephen; Banzhaf, Wolfgang

doi:10.1007/978-3-030-39958-0_6

Stephen Kelly⁸ &
Wolfgang Banzhaf⁹

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

742 Accesses
4 Citations

Abstract

Video games provide a well-defined study ground for the development of behavioural agents that learn through trial-and-error interaction with their environment, or reinforcement learning (RL). They cover a diverse range of environments that are designed to be challenging for humans, all through a high-dimensional visual interface. Tangled Program Graphs (TPG) is a recently proposed genetic programming algorithm that emphasizes emergent modularity (i.e. automatic construction of multi-agent organisms) in order to build successful RL agents more efficiently than state-of-the-art solutions from other sub-fields of artificial intelligence, e.g. deep neural networks. However, TPG organisms represent a direct mapping from input to output with no mechanism to integrate past experience (previous inputs). This is a limitation in environments with partial observability. For example, TPG performed poorly in video games that explicitly require the player to predict the trajectory of a moving object. In order to make these calculations, players must identify, store, and reuse important parts of past experience. In this work, we describe an approach to supporting this type of short-term temporal memory in TPG, and show that shared memory among subsets of agents within the same organism seems particularly important. In addition, we introduce heterogeneous TPG organisms composed of agents with distinct types of representation that collaborate through shared memory. In this study, heterogeneous organisms provide a parsimonious approach to supporting agents with task-specific functionality, image processing capabilities in the case of this work. Taken together, these extensions allow TPG to discover high-scoring behaviours for the Atari game Breakout, which is an environment it failed to make significant progress on previously.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This screen resolution corresponds to 40% of the raw Atari screen resolution. TPG has previously been shown to operate under the full Atari screen resolution [21]. The focus of this study is temporal memory, and the down sampling is used here to speed up empirical evaluations.
2.
An additional 10 runs were conducted for this analysis relative to the 10 runs summarized in Fig. 6.4a.

References

A. Simon, H.: The architecture of complexity. Proceedings of the American Philosophical Society 106, 467–482 (1962)
Google Scholar
Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: G. Squillero, P. Burelli (eds.) Applications of Evolutionary Computation, pp. 19–34. Springer International Publishing (2016)
Google Scholar
Atkins, D., Neshatian, K., Zhang, M.: A domain independent genetic programming approach to automatic feature extraction for image classification. In: 2011 IEEE Congress of Evolutionary Computation (CEC), pp. 238–245 (2011)
Google Scholar
Beattie, C., Leibo, J.Z., Teplyashin, D., Ward, T., Wainwright, M., Küttler, H., Lefrancq, A., Green, S., Valdés, V., Sadik, A., Schrittwieser, J., Anderson, K., York, S., Cant, M., Cain, A., Bolton, A., Gaffney, S., King, H., Hassabis, D., Legg, S., Petersen, S.: Deepmind lab. arXiv preprint arXiv:1612.03801 (2016)
Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279 (2013)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag (2006)
Google Scholar
Brameier, M., Banzhaf, W.: Linear Genetic Programming, 1st edn. Springer (2007)
Google Scholar
Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the 1st Annual Conference on Genetic Programming, pp. 261–266. MIT Press (1996)
Google Scholar
Choi, S.P.M., Yeung, D.Y., Zhang, N.L.: An environment model for nonstationary reinforcement learning. In: S.A. Solla, T.K. Leen, K. Müller (eds.) Advances in Neural Information Processing Systems 12, pp. 987–993. MIT Press (2000)
Google Scholar
Conrads, M., Nordin, P., Banzhaf, W.: Speech sound discrimination with genetic programming. In: W. Banzhaf, R. Poli, M. Schoenauer, T.C. Fogarty (eds.) Genetic Programming, pp. 113–129. Springer Berlin Heidelberg (1998)
Chapter Google Scholar
Davis, R.L., Zhong, Y.: The Biology of Forgetting – A Perspective. Neuron 95(3), 490–503 (2017)
Article Google Scholar
Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural turing machines for reward-based learning. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO ’16, pp. 117–124. ACM (2016)
Google Scholar
Hasselt, H.v., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 2094–2100. AAAI Press (2016)
Google Scholar
Hausknecht, M., Lehman, J., Miikkulainen, R., Stone, P.: A neuroevolution approach to general Atari game playing. IEEE Transactions on Computational Intelligence and AI in Games 6(4), 355–366 (2014)
Article Google Scholar
Haynes, T.D., Wainwright, R.L.: A simulation of adaptive agents in a hostile environment. In: Proceedings of the 1995 ACM Symposium on Applied Computing, SAC ’95, pp. 318–323. ACM (1995)
Google Scholar
Hintze, A., Edlund, J.A., Olson, R.S., Knoester, D.B., Schossau, J., Albantakis, L., Tehrani-Saleh, A., Kvam, P.D., Sheneman, L., Goldsby, H., Bohm, C., Adami, C.: Markov brains: A technical introduction. arXiv preprint 1709.05601 (2017)
Google Scholar
Hintze, A., Schossau, J., Bohm, C.: The evolutionary buffet method. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice XVI, Genetic and Evolutionary Computation Series, pp. 17–36. Springer (2018)
Google Scholar
Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A.G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science 364(6443), 859–865 (2019)
Article MathSciNet Google Scholar
Kelly, S.: Scaling genetic programming to challenging reinforcement tasks through emergent modularity. Ph.D. thesis, Faculty of Computer Science, Dalhousie University (2018)
Google Scholar
Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evolutionary Computation 26(3), 347–380 (2018)
Article Google Scholar
Kelly, S., Smith, R.J., Heywood, M.I.: Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial, pp. 37–57. Springer International Publishing (2019)
Google Scholar
Kober, J., Peters, J.: Reinforcement learning in robotics: A survey. In: M. Wiering, M. van Otterio (eds.) Reinforcement Learning, pp. 579–610. Springer (2012)
Google Scholar
Koza, J.R., Andre, D., Bennett, F.H., Keane, M.A.: Genetic Programming III: Darwinian Invention & Problem Solving, 1st edn. Morgan Kaufmann Publishers Inc. (1999)
Google Scholar
Krawiec, K., Bhanu, B.: Visual learning by coevolutionary feature synthesis. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 35(3), 409–425 (2005)
Google Scholar
Lalejini, A., Ofria, C.: What Else Is in an Evolved Name? Exploring Evolvable Specificity with SignalGP. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice XVI, pp. 103–121. Springer International Publishing (2019)
Google Scholar
Lughofer, E., Sayed-Mouchaweh, M.: Adaptive and on-line learning in non-stationary environments. Evolving Systems 6(2), 75–77 (2015)
Article Google Scholar
Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M., Bowling, M.: Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. J. Artif. Int. Res. 61(1), 523–562 (2018)
MathSciNet MATH Google Scholar
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: M.F. Balcan, K.Q. Weinberger (eds.) Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 48, pp. 1928–1937. PMLR (2016)
Google Scholar
Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, pp. 2204–2212. MIT Press (2014)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference on Learning Representations (2016)
Google Scholar
Smith, R.J., Heywood, M.I.: A model of external memory for navigation in partially observable visual reinforcement learning tasks. In: L. Sekanina, T. Hu, N. Lourenço, H. Richter, P. García-Sánchez (eds.) Genetic Programming, pp. 162–177. Springer International Publishing (2019)
Google Scholar
Stanley, K.O., Miikkulainen, R.: Evolving a Roving Eye for Go. In: T. Kanade, J. Kittler, J.M. Kleinberg, F. Mattern, J.C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M.Y. Vardi, G. Weikum, K. Deb (eds.) Genetic and Evolutionary Computation — GECCO 2004, vol. 3103, pp. 1226–1238. Springer Berlin Heidelberg, Berlin, Heidelberg (2004)
Google Scholar
Sutton, R.R., Barto, A.G.: Reinforcement Learning: An introduction. MIT Press (1998)
Google Scholar
Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence, vol. 1, pp. 136–141 (1994)
Google Scholar
Wagner, G.P., Altenberg, L.: Perspective: Complex adaptations and the evolution of evolvability. Evolution 50(3), 967–976 (1996)
Article Google Scholar
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pp. 1995–2003. JMLR.org (2016)
Google Scholar
Watson, R.A., Pollack, J.B.: Modular interdependency in complex dynamical systems. Artificial Life 11(4), 445–457 (2005)
Article Google Scholar
Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing atari games. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’18, pp. 229–236. ACM (2018)
Google Scholar

Download references

Acknowledgements

Stephen Kelly gratefully acknowledges support from the NSERC Postdoctoral Fellowship program. Computational resources for this research were provided by Michigan State University through the Institute for Cyber-Enabled Research (https://icer.msu.edu) and Compute Canada (https://computecanada.ca).

Author information

Authors and Affiliations

Department of Computer Science and Engineering & Beacon Center, Michigan State University, East Lansing, MI, USA
Stephen Kelly
Michigan State University, East Lansing, MI, USA
Wolfgang Banzhaf

Authors

Stephen Kelly
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Banzhaf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen Kelly .

Editor information

Editors and Affiliations

Computer Science and Engineering, John R. Koza Chair, Michigan State University, East Lansing, MI, USA
Wolfgang Banzhaf
BEACON Center, Michigan State University, East Lansing, MI, USA
Erik Goodman
Department of Computer Science and Engineering, Michigan State University, Okemos, MI, USA
Leigh Sheneman
Depto Ingenieria en Electronic Electrica Tecnológico Nacional de México/ IT, Tijuana, Baja California, Mexico
Leonardo Trujillo
Evolution Enterprises, Ann Arbor, MI, USA
Bill Worzel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kelly, S., Banzhaf, W. (2020). Temporal Memory Sharing in Visual Reinforcement Learning. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds) Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-39958-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-39958-0_6
Published: 08 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39957-3
Online ISBN: 978-3-030-39958-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics