Abstract
Existing MARL algorithms have low efficiency in many-agent scenarios due to the complex dynamic interaction when agents growing exponentially. Mean-field theory has been introduced to improve the scalability where complex interactions are approximated by those between a single agent and the mean effect from neighbors. However, only considering the averaged actions of neighborhood at last step and ignoring the dynamic influence of neighbors leads to unstable training procedures and sub-optimal solutions. In this paper, the Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition (MFRAD) framework is proposed by differentiating heterogeneous and hysteresis neighbor effect with weighted mean-field approximation and reward attribution decomposition. The multi-head attention is employed to calculate the weights which formulate the weighted mean-field Q-function. To further eliminate the impact of hysteresis information, reward attribution decomposition is integrated to decompose weighted mean-field Q-value, improving the interpretability of MFRAD and achieving fully decentralized execution without information exchanging. Two novel regularization terms are also introduced to guarantee the consistency of temporal relationship among agents and unambiguity of local Q-value with no agents. Numerical experiments on many-agent scenarios demonstrate the superior performance against existing baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Without causing confusion, we incorporate the parameters of this encoder into \(\theta ^{j}_{\mathrm {self}}\).
References
Chen, C., et al.: Toward a thousand lights: decentralized deep reinforcement learning for large-scale traffic signal control. AAAI 34(04), 3414–3421 (2020)
Fang, B., Wu, B., Wang, Z., Wang, H.: Large-scale multi-agent reinforcement learning based on weighted mean field. In: Sun, F., Liu, H., Fang, B. (eds.) ICCSIP 2020. CCIS, vol. 1397, pp. 309–316. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-2336-3_28
Ganapathi Subramanian, S., Poupart, P., Taylor, M.E., Hegde, N.: Multi type mean field reinforcement learning. In: AAMAS (2020)
Ganapathi Subramanian, S., Taylor, M.E., Crowley, M., Poupart, P.: Partially observable mean field reinforcement learning. In: AAMAS (2021)
Guo, X., Hu, A., Xu, R., Zhang, J.: Learning mean-field games. In: NeurIPS (2019)
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: AAMAS (2017)
Jeong, S.H., Kang, A.R., Kim, H.K.: Analysis of game bot’s behavioral characteristics in social interaction networks of MMORPG. ACM SIGCOMM Comput. Commun. Rev. 45(4), 99–100 (2015)
Jiang, J., Dun, C., Huang, T., Lu, Z.: Graph convolutional reinforcement learning. In: ICLR (2020)
Li, M., et al.: Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In: WWW (2019)
Li, W., Wang, X., Jin, B., Sheng, J., Hua, Y., Zha, H.: Structured diversification emergence via reinforced organization control and hierarchical consensus learning. In: AAMAS (2021)
Li, W., Wang, X., Jin, B., Sheng, J., Zha, H.: Dealing with non-stationarity in MARL via trust region decomposition. In: ICLR (2022)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: NeurIPS (2017)
Mao, H., et al.: Neighborhood cognition consistent multi-agent reinforcement learning. In: AAAI (2020)
Mao, H., Zhang, Z., Xiao, Z., Gong, Z., Ni, Y.: Learning multi-agent communication with double attentional deep reinforcement learning. Autonom. Agents Multi-agent Syst. 34(1), 1–34 (2020). https://doi.org/10.1007/s10458-020-09455-w
Matignon, L., Laurent, G.j., Le fort piat, N.: Review: Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)
Ren, W.: Represented value function approach for large scale multi agent reinforcement learning. Arxiv (2020)
Sheng, J., et al.: Learning to schedule multi-NUMA virtual machines via reinforcement learning. Pattern Recogn. 121, 108254 (2022)
Sheng, J., et al.: Learning structured communication for MARL. ArXiv (2020)
Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: ICML (2019)
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: AAMAS (2018)
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: ICML (1993)
Varaiya, P.: Max pressure control of a network of signalized intersections. Transp. Res. Part C Emerg. Technol. 36, 177–195 (2013)
Yang, F., Vereshchaka, A., Chen, C., Dong, W.: Bayesian multi-type mean field multi-agent imitation learning. In: NeurIPS (2020)
Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., Wang, J.: Mean field multi-agent reinforcement learning. In: ICML (2018)
Ye, D., et al.: Towards playing full MOBA games with deep reinforcement learning. In: NeurIPS (2020)
Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: CVPR (2017)
Zhang, T., et al.: Multi-agent collaboration via reward attribution decomposition. Arxiv (2020)
Zheng, L., Yang, J., Cai, H., Zhou, M., Zhang, W., Wang, J., Yu, Y.: MAgent: a many-agent reinforcement learning platform for artificial collective intelligence. In: AAAI (2018)
Zhou, M., et al.: Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching. In: CIKM, pp. 2645–2653 (2019)
Zimmer, M., Glanois, C., Siddique, U., Weng, P.: Learning fair policies in decentralized cooperative multi-agent reinforcement learning. In: ICML (2021)
Acknowledgment
This work was supported in part by the National Key Research and Development Program of China (No. 2020AAA0107400), STCSM (No. 18DZ2271000 and 19ZR141420), NSFC (No. 12071145) and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, T., Li, W., Jin, B., Zhang, W., Wang, X. (2022). Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-11217-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11216-4
Online ISBN: 978-3-031-11217-1
eBook Packages: Computer ScienceComputer Science (R0)