Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition

Wu, Tingyu; Li, Wenhao; Jin, Bo; Zhang, Wei; Wang, Xiangfeng

doi:10.1007/978-3-031-11217-1_22

Tingyu Wu¹⁰,
Wenhao Li¹⁰,
Bo Jin¹⁰,
Wei Zhang¹¹ &
…
Xiangfeng Wang¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13248))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1166 Accesses
2 Citations

Abstract

Existing MARL algorithms have low efficiency in many-agent scenarios due to the complex dynamic interaction when agents growing exponentially. Mean-field theory has been introduced to improve the scalability where complex interactions are approximated by those between a single agent and the mean effect from neighbors. However, only considering the averaged actions of neighborhood at last step and ignoring the dynamic influence of neighbors leads to unstable training procedures and sub-optimal solutions. In this paper, the Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition (MFRAD) framework is proposed by differentiating heterogeneous and hysteresis neighbor effect with weighted mean-field approximation and reward attribution decomposition. The multi-head attention is employed to calculate the weights which formulate the weighted mean-field Q-function. To further eliminate the impact of hysteresis information, reward attribution decomposition is integrated to decompose weighted mean-field Q-value, improving the interpretability of MFRAD and achieving fully decentralized execution without information exchanging. Two novel regularization terms are also introduced to guarantee the consistency of temporal relationship among agents and unambiguity of local Q-value with no agents. Numerical experiments on many-agent scenarios demonstrate the superior performance against existing baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Without causing confusion, we incorporate the parameters of this encoder into \(\theta ^{j}_{\mathrm {self}}\).

References

Chen, C., et al.: Toward a thousand lights: decentralized deep reinforcement learning for large-scale traffic signal control. AAAI 34(04), 3414–3421 (2020)
Article Google Scholar
Fang, B., Wu, B., Wang, Z., Wang, H.: Large-scale multi-agent reinforcement learning based on weighted mean field. In: Sun, F., Liu, H., Fang, B. (eds.) ICCSIP 2020. CCIS, vol. 1397, pp. 309–316. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-2336-3_28
Chapter Google Scholar
Ganapathi Subramanian, S., Poupart, P., Taylor, M.E., Hegde, N.: Multi type mean field reinforcement learning. In: AAMAS (2020)
Google Scholar
Ganapathi Subramanian, S., Taylor, M.E., Crowley, M., Poupart, P.: Partially observable mean field reinforcement learning. In: AAMAS (2021)
Google Scholar
Guo, X., Hu, A., Xu, R., Zhang, J.: Learning mean-field games. In: NeurIPS (2019)
Google Scholar
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: AAMAS (2017)
Google Scholar
Jeong, S.H., Kang, A.R., Kim, H.K.: Analysis of game bot’s behavioral characteristics in social interaction networks of MMORPG. ACM SIGCOMM Comput. Commun. Rev. 45(4), 99–100 (2015)
Article Google Scholar
Jiang, J., Dun, C., Huang, T., Lu, Z.: Graph convolutional reinforcement learning. In: ICLR (2020)
Google Scholar
Li, M., et al.: Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In: WWW (2019)
Google Scholar
Li, W., Wang, X., Jin, B., Sheng, J., Hua, Y., Zha, H.: Structured diversification emergence via reinforced organization control and hierarchical consensus learning. In: AAMAS (2021)
Google Scholar
Li, W., Wang, X., Jin, B., Sheng, J., Zha, H.: Dealing with non-stationarity in MARL via trust region decomposition. In: ICLR (2022)
Google Scholar
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: NeurIPS (2017)
Google Scholar
Mao, H., et al.: Neighborhood cognition consistent multi-agent reinforcement learning. In: AAAI (2020)
Google Scholar
Mao, H., Zhang, Z., Xiao, Z., Gong, Z., Ni, Y.: Learning multi-agent communication with double attentional deep reinforcement learning. Autonom. Agents Multi-agent Syst. 34(1), 1–34 (2020). https://doi.org/10.1007/s10458-020-09455-w
Article Google Scholar
Matignon, L., Laurent, G.j., Le fort piat, N.: Review: Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)
Google Scholar
Ren, W.: Represented value function approach for large scale multi agent reinforcement learning. Arxiv (2020)
Google Scholar
Sheng, J., et al.: Learning to schedule multi-NUMA virtual machines via reinforcement learning. Pattern Recogn. 121, 108254 (2022)
Article Google Scholar
Sheng, J., et al.: Learning structured communication for MARL. ArXiv (2020)
Google Scholar
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: ICML (2019)
Google Scholar
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: AAMAS (2018)
Google Scholar
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: ICML (1993)
Google Scholar
Varaiya, P.: Max pressure control of a network of signalized intersections. Transp. Res. Part C Emerg. Technol. 36, 177–195 (2013)
Article Google Scholar
Yang, F., Vereshchaka, A., Chen, C., Dong, W.: Bayesian multi-type mean field multi-agent imitation learning. In: NeurIPS (2020)
Google Scholar
Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., Wang, J.: Mean field multi-agent reinforcement learning. In: ICML (2018)
Google Scholar
Ye, D., et al.: Towards playing full MOBA games with deep reinforcement learning. In: NeurIPS (2020)
Google Scholar
Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: CVPR (2017)
Google Scholar
Zhang, T., et al.: Multi-agent collaboration via reward attribution decomposition. Arxiv (2020)
Google Scholar
Zheng, L., Yang, J., Cai, H., Zhou, M., Zhang, W., Wang, J., Yu, Y.: MAgent: a many-agent reinforcement learning platform for artificial collective intelligence. In: AAAI (2018)
Google Scholar
Zhou, M., et al.: Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching. In: CIKM, pp. 2645–2653 (2019)
Google Scholar
Zimmer, M., Glanois, C., Siddique, U., Weng, P.: Learning fair policies in decentralized cooperative multi-agent reinforcement learning. In: ICML (2021)
Google Scholar

Download references

Acknowledgment

This work was supported in part by the National Key Research and Development Program of China (No. 2020AAA0107400), STCSM (No. 18DZ2271000 and 19ZR141420), NSFC (No. 12071145) and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

School of Computer Science and Engineering, East China Normal University, Shanghai, China
Tingyu Wu, Wenhao Li, Bo Jin & Xiangfeng Wang
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Sichuan, China
Wei Zhang

Authors

Tingyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wenhao Li
View author publications
You can also search for this author in PubMed Google Scholar
Bo Jin
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangfeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangfeng Wang .

Editor information

Editors and Affiliations

University of Aizu, Aizu, Japan
Uday Kiran Rage
Indraprastha Institute of Information Technology, Delhi, India
Vikram Goyal
Data Sciences and Analytics Center, International Institute of Information Technology, Hyderabad, Telangana, India
P. Krishna Reddy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, T., Li, W., Jin, B., Zhang, W., Wang, X. (2022). Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-11217-1_22
Published: 16 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11216-4
Online ISBN: 978-3-031-11217-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition