Investigating Partner Diversification Methods in Cooperative Multi-agent Deep Reinforcement Learning

Charakorn, Rujikorn; Manoonpong, Poramate; Dilokthanakul, Nat

doi:10.1007/978-3-030-63823-8_46

Rujikorn Charakorn¹¹,
Poramate Manoonpong^11,12 &
Nat Dilokthanakul¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1333))

Included in the following conference series:

International Conference on Neural Information Processing

2331 Accesses
1 Citations

Abstract

Overfitting to learning partners is a known problem, in multi-agent reinforcement learning (MARL), due to the co-evolution of learning agents. Previous works explicitly add diversity to learning partners for mitigating this problem. However, since there are many approaches for introducing diversity, it is not clear which one should be used under what circumstances. In this work, we clarify the situation and reveal that widely used methods such as partner sampling and population-based training are unreliable at introducing diversity under fully cooperative multi-agent Markov decision process. We find that generating pre-trained partners is a simple yet effective procedure to achieve diversity. Finally, we highlight the impact of diversified learning partners on the generalization of learning agents using cross-play and ad-hoc team performance as evaluation metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., Mordatch, I.: Emergent complexity via multi-agent competition. arXiv preprint arXiv:1710.03748 (2017)
Barrett, S., Rosenfeld, A., Kraus, S., Stone, P.: Making friends on the fly: cooperating with new teammates. Artif. Intell. 242, 132–171 (2017)
Article MathSciNet Google Scholar
Canaan, R., Gao, X., Togelius, J., Nealen, A., Menzel, S.: Generating and adapting to diverse ad-hoc cooperation agents in Hanabi. arXiv preprint arXiv:2004.13710(2020)
Carroll, M., et al.: On the utility of learning about humans for human-AI coordination. In: Advances in Neural Information Processing Systems, pp. 5175–5186 (2019)
Google Scholar
Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Ghosh, A., Tschiatschek, S., Mahdavi, H., Singla, A.: Towards deployment of robust AI agents for human-machine partnerships. arXiv preprint arXiv:1910.02330 (2019)
Grover, A., Al-Shedivat, M., Gupta, J.K., Burda, Y., Edwards, H.: Learning policy representations in multiagent systems. arXiv preprint arXiv:1806.06464 (2018)
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Hu, H., Foerster, J.N.: Simplified action decoder for deep multi-agent reinforcement learning. arXiv preprint arXiv:1912.02288 (2019)
Hu, H., Lerer, A., Peysakhovich, A., Foerster, J.: “Other-play” for zero-shot coordination. arXiv preprint arXiv:2003.02979 (2020)
Islam, R., Henderson, P., Gomrokchi, M., Precup, D.: Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. arXiv preprintarXiv:1708.04133 (2017)
Google Scholar
Justesen, N., Torrado, R.R., Bontrager, P., Khalifa, A., Togelius, J., Risi, S.: Illuminating generalization in deep reinforcement learning through procedural level generation. arXiv preprint arXiv:1806.10729 (2018)
Lanctot, M., et al.: A unified game-theoretic approach to multiagent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 4190–4203 (2017)
Google Scholar
Le, H.M., Yue, Y., Carr, P., Lucey, P.: Coordinated multi-agent imitation learning. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1995–2003. JMLR. org (2017)
Google Scholar
Li, M.G., Jiang, B., Zhu, H., Che, Z., Liu, Y.: Generative attention networks for multi-agent behavioral modeling. In: AAAI, pp. 7195–7202 (2020)
Google Scholar
Liu, S., Lever, G., Merel, J., Tunyasuvunakool, S., Heess, N., Graepel, T.: Emergent coordination through competition. arXiv preprint arXiv:1902.07151 (2019)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp. 6379–6390 (2017)
Google Scholar
OpenAI, Berner, C., et al.: Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Stone, P., Kaminka, G.A., Kraus, S., Rosenschein, J.S.: Ad hoc autonomous agent teams: Collaboration without pre-coordination. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)
Google Scholar
Vinyals, O., et al.: Grandmaster level in Starcraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bio-inspired Robotics & Neural Engineering Lab, School of Information Science & Technology, Vidyasirimedhi Institute of Science and Technology, Rayong, Thailand
Rujikorn Charakorn, Poramate Manoonpong & Nat Dilokthanakul
Embodied Artificial Intelligence and Neurorobotics Lab, The Mærsk Mc-Kinney Møller Institute, University of Southern Denmark, Odense, Denmark
Poramate Manoonpong

Authors

Rujikorn Charakorn
View author publications
You can also search for this author in PubMed Google Scholar
Poramate Manoonpong
View author publications
You can also search for this author in PubMed Google Scholar
Nat Dilokthanakul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nat Dilokthanakul .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Charakorn, R., Manoonpong, P., Dilokthanakul, N. (2020). Investigating Partner Diversification Methods in Cooperative Multi-agent Deep Reinforcement Learning. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1333. Springer, Cham. https://doi.org/10.1007/978-3-030-63823-8_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-63823-8_46
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63822-1
Online ISBN: 978-3-030-63823-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics