Skip to main content

Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition

  • Conference paper
  • First Online:
Database Systems for Advanced Applications. DASFAA 2022 International Workshops (DASFAA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13248))

Included in the following conference series:

Abstract

Existing MARL algorithms have low efficiency in many-agent scenarios due to the complex dynamic interaction when agents growing exponentially. Mean-field theory has been introduced to improve the scalability where complex interactions are approximated by those between a single agent and the mean effect from neighbors. However, only considering the averaged actions of neighborhood at last step and ignoring the dynamic influence of neighbors leads to unstable training procedures and sub-optimal solutions. In this paper, the Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition (MFRAD) framework is proposed by differentiating heterogeneous and hysteresis neighbor effect with weighted mean-field approximation and reward attribution decomposition. The multi-head attention is employed to calculate the weights which formulate the weighted mean-field Q-function. To further eliminate the impact of hysteresis information, reward attribution decomposition is integrated to decompose weighted mean-field Q-value, improving the interpretability of MFRAD and achieving fully decentralized execution without information exchanging. Two novel regularization terms are also introduced to guarantee the consistency of temporal relationship among agents and unambiguity of local Q-value with no agents. Numerical experiments on many-agent scenarios demonstrate the superior performance against existing baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Without causing confusion, we incorporate the parameters of this encoder into \(\theta ^{j}_{\mathrm {self}}\).

References

  1. Chen, C., et al.: Toward a thousand lights: decentralized deep reinforcement learning for large-scale traffic signal control. AAAI 34(04), 3414–3421 (2020)

    Article  Google Scholar 

  2. Fang, B., Wu, B., Wang, Z., Wang, H.: Large-scale multi-agent reinforcement learning based on weighted mean field. In: Sun, F., Liu, H., Fang, B. (eds.) ICCSIP 2020. CCIS, vol. 1397, pp. 309–316. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-2336-3_28

    Chapter  Google Scholar 

  3. Ganapathi Subramanian, S., Poupart, P., Taylor, M.E., Hegde, N.: Multi type mean field reinforcement learning. In: AAMAS (2020)

    Google Scholar 

  4. Ganapathi Subramanian, S., Taylor, M.E., Crowley, M., Poupart, P.: Partially observable mean field reinforcement learning. In: AAMAS (2021)

    Google Scholar 

  5. Guo, X., Hu, A., Xu, R., Zhang, J.: Learning mean-field games. In: NeurIPS (2019)

    Google Scholar 

  6. Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: AAMAS (2017)

    Google Scholar 

  7. Jeong, S.H., Kang, A.R., Kim, H.K.: Analysis of game bot’s behavioral characteristics in social interaction networks of MMORPG. ACM SIGCOMM Comput. Commun. Rev. 45(4), 99–100 (2015)

    Article  Google Scholar 

  8. Jiang, J., Dun, C., Huang, T., Lu, Z.: Graph convolutional reinforcement learning. In: ICLR (2020)

    Google Scholar 

  9. Li, M., et al.: Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In: WWW (2019)

    Google Scholar 

  10. Li, W., Wang, X., Jin, B., Sheng, J., Hua, Y., Zha, H.: Structured diversification emergence via reinforced organization control and hierarchical consensus learning. In: AAMAS (2021)

    Google Scholar 

  11. Li, W., Wang, X., Jin, B., Sheng, J., Zha, H.: Dealing with non-stationarity in MARL via trust region decomposition. In: ICLR (2022)

    Google Scholar 

  12. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: NeurIPS (2017)

    Google Scholar 

  13. Mao, H., et al.: Neighborhood cognition consistent multi-agent reinforcement learning. In: AAAI (2020)

    Google Scholar 

  14. Mao, H., Zhang, Z., Xiao, Z., Gong, Z., Ni, Y.: Learning multi-agent communication with double attentional deep reinforcement learning. Autonom. Agents Multi-agent Syst. 34(1), 1–34 (2020). https://doi.org/10.1007/s10458-020-09455-w

    Article  Google Scholar 

  15. Matignon, L., Laurent, G.j., Le fort piat, N.: Review: Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012)

    Google Scholar 

  16. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  17. Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)

    Google Scholar 

  18. Ren, W.: Represented value function approach for large scale multi agent reinforcement learning. Arxiv (2020)

    Google Scholar 

  19. Sheng, J., et al.: Learning to schedule multi-NUMA virtual machines via reinforcement learning. Pattern Recogn. 121, 108254 (2022)

    Article  Google Scholar 

  20. Sheng, J., et al.: Learning structured communication for MARL. ArXiv (2020)

    Google Scholar 

  21. Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: ICML (2019)

    Google Scholar 

  22. Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: AAMAS (2018)

    Google Scholar 

  23. Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: ICML (1993)

    Google Scholar 

  24. Varaiya, P.: Max pressure control of a network of signalized intersections. Transp. Res. Part C Emerg. Technol. 36, 177–195 (2013)

    Article  Google Scholar 

  25. Yang, F., Vereshchaka, A., Chen, C., Dong, W.: Bayesian multi-type mean field multi-agent imitation learning. In: NeurIPS (2020)

    Google Scholar 

  26. Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., Wang, J.: Mean field multi-agent reinforcement learning. In: ICML (2018)

    Google Scholar 

  27. Ye, D., et al.: Towards playing full MOBA games with deep reinforcement learning. In: NeurIPS (2020)

    Google Scholar 

  28. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: CVPR (2017)

    Google Scholar 

  29. Zhang, T., et al.: Multi-agent collaboration via reward attribution decomposition. Arxiv (2020)

    Google Scholar 

  30. Zheng, L., Yang, J., Cai, H., Zhou, M., Zhang, W., Wang, J., Yu, Y.: MAgent: a many-agent reinforcement learning platform for artificial collective intelligence. In: AAAI (2018)

    Google Scholar 

  31. Zhou, M., et al.: Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching. In: CIKM, pp. 2645–2653 (2019)

    Google Scholar 

  32. Zimmer, M., Glanois, C., Siddique, U., Weng, P.: Learning fair policies in decentralized cooperative multi-agent reinforcement learning. In: ICML (2021)

    Google Scholar 

Download references

Acknowledgment

This work was supported in part by the National Key Research and Development Program of China (No. 2020AAA0107400), STCSM (No. 18DZ2271000 and 19ZR141420), NSFC (No. 12071145) and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangfeng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, T., Li, W., Jin, B., Zhang, W., Wang, X. (2022). Weighted Mean-Field Multi-Agent Reinforcement Learning via Reward Attribution Decomposition. In: Rage, U.K., Goyal, V., Reddy, P.K. (eds) Database Systems for Advanced Applications. DASFAA 2022 International Workshops. DASFAA 2022. Lecture Notes in Computer Science, vol 13248. Springer, Cham. https://doi.org/10.1007/978-3-031-11217-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11217-1_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11216-4

  • Online ISBN: 978-3-031-11217-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics