Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model

Smith, Robert J.; Heywood, Malcolm I.

doi:10.1007/978-3-030-39958-0_17

Robert J. Smith⁸ &
Malcolm I. Heywood⁸

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

726 Accesses
2 Citations

Abstract

Reinforcement learning (RL) tasks have often assumed a Markov decision process, which is to say, state information is ‘complete’, hence there is no need to learn what to learn from. However, recent advances—such as visual reinforcement learning—have enabled the tasks typically addressed using RL to expand to include significant amounts of partial observability. This implies that the representation needs to support multiple forms of memory, thus credit assignment needs to: find efficient ways to encode high dimensional data, as well has, determining under what conditions to save and recall specific pieces of information, and for what purpose. In this work, we assume the tangled program graph (TPG) formulation for genetic programming, where this has already demonstrated competitiveness with deep learning solutions to multiple RL tasks (under complete information). In this work, TPG is augmented with indexed memory using a probabilistic formulation of a write operation (defines long and short term memory) and an indexed read. Moreover, the register information specific to the programs co-operating within a program is used to provide the low dimensional encoding of state. We demonstrate that TPG can then successfully evolve a behaviour for a hero bot in the Dota 2 game engine when playing in a single lane 1-on-1 configuration with the game engine hero bot as the opponent. Specific recommendations are made regarding the design of an appropriate fitness function. We show that TPG without indexed memory completely fails to learn any useful behaviour. Only with indexed memory are useful hero behaviours discovered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://blog.dota2.com/?l=english.
2.
https://openai.com/blog/how-to-train-your-openai-five/.
3.
https://openai.com/blog/openai-baselines-ppo/.
4.
The computational resources used by OpenAI are in the order of 180 years of gameplay per day.
5.
https://dota2.gamepedia.com/Heroes.
6.
LSTM is widely used as it addresses one of the potential pathologies of recurrent connectivity under gradient decent, that of vanishing gradients.
7.
Conditional instructions could change this [6].
8.
Indexed memory is initialized once at generation zero with NULL content.
9.
Up to 20,000 features are available, however, we are only interested in the case of a 1-on-1 single lane configuration of the game (as opposed to 5 heroes per team over 3 lanes).
10.
https://dota2.gamepedia.com/Shadow_Fiend.
11.
https://dota2.gamepedia.com/Creep_control_techniques.
12.
Heat map produced using [5].

References

Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: EvoApplications, LNCS, vol. 9597, pp. 19–34 (2016)
Google Scholar
Aiyer, S.V.B., Niranjan, N., Fallside, F.: A theoretical investigation into the performance of the Hopfield model. IEEE Transactions on Neural Networks 15, 204–215 (1990)
Article Google Scholar
Andersson, B., Nordin, P., Nordahl, M.: Reactive and memory-based genetic programming for robot control. In: European Conference on Genetic Programming, LNCS, vol. 1598, pp. 161–172 (1999)
Google Scholar
Andre, D.: Evolution of mapmaking ability: Strategies for the evolution of learning, planning, and memory using genetic programming. In: IEEE World Congress on Computational Intelligence, pp. 250–255 (1994)
Google Scholar
Babicki, S., Arndt, D., Marcu, A., Liang, Y., Grant, J.R., Maciejewski, A., Wishart, D.S.: Heatmapper: web-enabled heat mapping for all. Nucleic Acids Research (2016). http://www.heatmapper.ca/
Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer (2007)
Google Scholar
Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the Annual Conference on Genetic Programming (1996)
Google Scholar
Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)
Article Google Scholar
Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. CoRR abs/1410.5401 (2014)
Google Scholar
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwinska, A., Colmenarejo, S.G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A.P., Hermann, K.M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., Hassabis, D.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)
Article Google Scholar
Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28(10), 2222–2231 (2017)
Article MathSciNet Google Scholar
Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural turing machines for reward-based learning. In: ACM Genetic and Evolutionary Computation Conference, pp. 117–124 (2016)
Google Scholar
Grossberg, S.: Content-addressable memory storage by neural networks: A general model and global Liapunov method. In: E.L. Schwartz (ed.) Computational Neuroscience, pp. 56–65. MIT Press (1990)
Google Scholar
Haddadi, F., Kayacik, H.G., Zincir-Heywood, A.N., Heywood, M.I.: Malicious automatically generated domain name detection using stateful-SBB. In: EvoApplications, LNCS, vol. 7835, pp. 529–539 (2013)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
Article Google Scholar
Huelsbergen, L.: Toward simulated evolution of machine language iteration. In: Proceedings of the Annual Conference on Genetic Programming, pp. 315–320 (1996)
Google Scholar
Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A.G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019)
Article MathSciNet Google Scholar
Kelly, S., Banzhaf, W.: Temporal memory sharing in visual reinforcement learning. In: W. Banzhaf, E. Goodman, L. Sheneman, L. Trujillo, B. Worzel (eds.) Genetic Programming Theory and Practice, vol. XVII. Springer (2020)
Google Scholar
Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: European Conference on Genetic Programming, LNCS, vol. 10196, pp. 64–79 (2017)
Google Scholar
Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference, pp. 195–202 (2017)
Google Scholar
Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evolutionary Computation 26(3), 347–380 (2018)
Article Google Scholar
Kelly, S., Smith, R.J., Heywood, M.I.: Emergent policy discovery for visual reinforcement learning through tangled program graphs: A tutorial. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice, vol. XVI, chap. 3, pp. 37–57. Springer (2019)
Google Scholar
Langdon, W.B.: Genetic Programming and Data Structures. Kluwer Academic (1998)
Google Scholar
Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)
Google Scholar
Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M., Bowling, M.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research 61, 523–562 (2018)
Article MathSciNet Google Scholar
Merrild, J., Rasmussen, M.A., Risi, S.: Hyperntm: Evolving scalable neural turing machines through hyperneat. In: EvoApplications, pp. 750–766 (2018)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Nordin, P.: A compiling genetic programming system that directly manipulates the machine code. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 311–332. MIT Press (1994)
Google Scholar
Poli, R., McPhee, N.F., Citi, L., Crane, E.: Memory with memory in genetic programming. Journal of Artificial Evolution and Applications (2009)
Google Scholar
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2016)
Google Scholar
Sapienza, A., Peng, H., Ferrara, E.: Performance dynamics and success in online games. In: IEEE International Conference on Data Mining Workshops, pp. 902–909 (2017)
Google Scholar
Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: European Conference on Genetic Programming, LNCS, vol. 10781, pp. 135–150 (2018)
Google Scholar
Smith, R.J., Heywood, M.I.: Evolving Dota 2 Shadow Fiend bots using genetic programming with external memory. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference (2019)
Google Scholar
Smith, R.J., Heywood, M.I.: A model of external memory for navigation in partially observable visual reinforcement learning tasks. In: European Conference on Genetic Programming, LNCS, vol. 11451, pp. 162–177 (2019)
Google Scholar
Spector, L., Luke, S.: Cultural transmission of information in genetic programming. In: Annual Conference on Genetic Programming, pp. 209–214 (1996)
Google Scholar
Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR abs/1712.06567 (2018)
Google Scholar
Teller, A.: The evolution of mental models. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 199–220. MIT Press (1994)
Google Scholar
Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: IEEE Congress on Evolutionary Computation, pp. 136–141 (1994)
Google Scholar
Wayne, G., Hung, C.C., Amos, D., Mirza, M., Ahuja, A., Grabska-Barwińska, A., Rae, J., Mirowski, P., Leibo, J.Z., Santoro, A., Gemici, M., Reynolds, M., Harley, T., Abramson, J., Mohamed, S., Rezende, D., Saxton, D., Cain, A., Hillier, C., Silver, D., Kavukcuoglu, K., Botvinick, M., Hasssbis, D., Lillicrap, T.: Unsupervised predictive memory in a goal-directed agent. CoRR abs/1803.10760 (2018)
Google Scholar
Wydmuch, M., Kempka, M., Jaśkowski, W.: ViZDoom competitions: Playing doom from pixels. IEEE Transactions on Games to appear (2019)
Google Scholar

Download references

Acknowledgements

We gratefully acknowledge support from the NSERC CRD program (Canada).

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
Robert J. Smith & Malcolm I. Heywood

Authors

Robert J. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm I. Heywood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malcolm I. Heywood .

Editor information

Editors and Affiliations

Computer Science and Engineering, John R. Koza Chair, Michigan State University, East Lansing, MI, USA
Wolfgang Banzhaf
BEACON Center, Michigan State University, East Lansing, MI, USA
Erik Goodman
Department of Computer Science and Engineering, Michigan State University, Okemos, MI, USA
Leigh Sheneman
Depto Ingenieria en Electronic Electrica Tecnológico Nacional de México/ IT, Tijuana, Baja California, Mexico
Leonardo Trujillo
Evolution Enterprises, Ann Arbor, MI, USA
Bill Worzel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Smith, R.J., Heywood, M.I. (2020). Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds) Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-39958-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-39958-0_17
Published: 08 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39957-3
Online ISBN: 978-3-030-39958-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics