Independent Deep Deterministic Policy Gradient Reinforcement Learning in Cooperative Multiagent Pursuit Games

Zhou, Shiyang; Ren, Weiya; Ren, Xiaoguang; Wang, Yanzhen; Yi, Xiaodong

doi:10.1007/978-3-030-86380-7_51

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12894))

Included in the following conference series:

International Conference on Artificial Neural Networks

2253 Accesses
2 Citations

Abstract

In this paper, we study a fully decentralized multi-agent pursuit problem in a non-communication environment. Fully decentralized (decentralized training and decentralized execution) has stronger robustness and scalability compared with centralized training and decentralized execution (CTDE), which is the current popular multi-agent reinforcement learning method. Both centralized training and communication mechanism require a large amount of information exchange between agents, which are strong assumptions that are difficult to meet in reality. However, traditional fully decentralized multi-agent reinforcement learning methods (e.g., IQL) are difficult to converge stably due to the dynamic changes of other agents’ strategies. Therefore, we extend actor-critic to actor-critic-N framework, and propose Potential-Field-Guided Deep Deterministic Policy Gradient (PGDDPG) method on this basis. The agent uses the unified artificial potential field to guide the agent’s strategy updating, which reduces the uncertainty of multi-agent’s decision making in the complex and dynamic changing environment. Thus, PGDDPG which we proposed can converge fast and stably. Finally, through the pursuit experiments in MPE and CARLA, we prove that our method achieves higher success rate and more stable performance than DDPG and MADDPG.

S. Zhou and W. Ren—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/zhoushiyang12/pgddpg.git.

References

Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR (July 2017)
Google Scholar
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16 (2017)
Google Scholar
Hao, X., Wang, W., Hao, J., Yang, Y.: Independent generative adversarial self-imitation learning in cooperative multiagent systems. arXiv preprint arXiv:1909.11468 (2019)
Harutyunyan, A., Devlin, S., Vrancx, P., Nowé, A.: Expressing arbitrary reward functions as potential-based advice. In: AAAI, pp. 2652–2658 (2015)
Google Scholar
Hernandez-Leal, P., Kartal, B., Taylor, M.E.: A survey and critique of multiagent deep reinforcement learning. Auton. Agents Multi-agent Syst. 33(6), 750–797 (2019). https://doi.org/10.1007/s10458-019-09421-1
Article Google Scholar
Jaques, N., et al.: Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: ICML, pp. 3040–3049. PMLR (2019)
Google Scholar
Jiang, J., Lu, Z.: Learning attentional communication for multi-agent cooperation. CoRR abs/1805.07733 (2018). http://arxiv.org/abs/1805.07733
LaValle, S.M.: Planning Algorithms. Cambridge University Press (2006)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: NIPS (2017)
Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–17 (2017)
Google Scholar
Peng, P., et al.: Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play StarCraft Combat Games. arXiv preprint arXiv:1703.10069 (2017)
Radmanesh, M., Kumar, M., Guentert, P.H., Sarim, M.: Overview of path-planning and obstacle avoidance algorithms for UAVs: a comparative study. Unmanned Syst. 6(02), 95–118 (2018)
Article Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Siddiqui, R.: Path planning using potential field algorithm. Medium (July 2018)
Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic Policy Gradient Algorithms (2014)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Google Scholar
Xie, L., et al.: Learning with Stochastic guidance for navigation. arXiv preprint arXiv:1811.10756 (2018)

Download references

Author information

Authors and Affiliations

Artificial Intelligence Research Center, Defense Innovation Institute, Beijing, 100072, China
Shiyang Zhou, Weiya Ren, Xiaoguang Ren, Yanzhen Wang & Xiaodong Yi
Tianjin Artificial Intelligence Innovation Center, Tianjin, 300457, China
Shiyang Zhou, Weiya Ren, Xiaoguang Ren, Yanzhen Wang & Xiaodong Yi

Authors

Shiyang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Weiya Ren
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Ren
View author publications
You can also search for this author in PubMed Google Scholar
Yanzhen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Yi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, S., Ren, W., Ren, X., Wang, Y., Yi, X. (2021). Independent Deep Deterministic Policy Gradient Reinforcement Learning in Cooperative Multiagent Pursuit Games. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_51

Download citation

DOI: https://doi.org/10.1007/978-3-030-86380-7_51
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86379-1
Online ISBN: 978-3-030-86380-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics