Abstract
The value of an agent for a team can vary significantly depending on the heterogeneity of the team and the kind of game: cooperative, competitive, or both. Several evaluation approaches have been introduced in some of these scenarios, from homogeneous competitive multi-agent systems, using a simple average or sophisticated ranking protocols, to completely heterogeneous cooperative scenarios, using the Shapley value. However, we lack a general evaluation metric to address situations with both cooperation and (asymmetric) competition, and varying degrees of heterogeneity (from completely homogeneous teams to completely heterogeneous teams with no repeated agents) to better understand whether multi-agent learning agents can adapt to this diversity. In this paper, we extend the Shapley value to incorporate both repeated players and competition. Because of the combinatorial explosion of team multisets and opponents, we analyse several sampling strategies, which we evaluate empirically. We illustrate the new metric in a predator and prey game, where we show that the gain of some multi-agent reinforcement learning agents for homogeneous situations is lost when operating in heterogeneous teams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Roth, A.E.: The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press (1988)
Li, S., Wu, Y., Cui, X., Dong, H., Fang, F., Russell, S.: Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. AAAI 33, 4213–4220 (2019)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275 (2017)
Mordatch, I., Abbeel, P.: Emergence of grounded compositional language in multi-agent populations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)
Sessa, P.G., Bogunovic, I., Kamgarpour, M., Krause, A.: No-regret learning in unknown games with correlated payoffs. In: NeurIPS (2019)
Aas, K., Jullum, M., Løland, A.: Explaining individual predictions when features are dependent: more accurate approximations to Shapley values. arXiv preprint arXiv:1903.10464 (2019)
Nash, J.J.F.: Equilibrium points in n-person games. Proceed. Nation. Acad. Sci. 36(1), 48–49 (1950)
Aleksandrov, M., Walsh, T.: Pure nash equilibria in online fair division. In: IJCAI, pp. 42–48 (2017)
Balduzzi, D., Tuyls, K., Perolat, J., et al.: Re-evaluating evaluation. In: Advances in Neural Information Processing Systems 31 (2018)
Omidshafiei, S., Papadimitriou, C., Piliouras, G., et al.: \(\alpha \)-rank: multi-agent evaluation by evolution. Sci. Rep. 9(1), 1–29 (2019)
Elo, A.E.: The rating of chess players, past and present. Acta Paediatrica 32(3–4), 201–217 (1978)
Glickman, M.E., Jones, A.C.: Rating the chess rating system. Chance-Berlin then New york 12, 21–28 (1999)
Minka, T., Cleven, R., Zaykov, Y.: Trueskill 2: an improved Bayesian skill rating system. Technical Report (2018)
Harkness, K.: Official chess hand- book. D. McKay Company (1967)
Kiourt, C., Kalles, D., Pavlidis, G.: Rating the skill of synthetic agents in competitive multi-agent environments. Knowl. Inf. Syst. 58(1), 35–58 (2019)
Kiourt, C., Kalles, D., Pavlidis, G.: Rating the skill of synthetic agents in competitive multi-agent environments. Knowl. Inf. Syst. 58(1), 35–58 (2019)
Fatima, S.S., Wooldridge, M., Jennings, N.R.: A linear approximation method for the Shapley value. Artif. Intell. 172(14), 1673–1699 (2008)
Kotthoff, L., Fréchette, A., Michalak, T.P., et al.: Quantifying algorithmic improvements over time. In: IJCAI, pp. 5165–5171 (2018)
Li, J., Kuang, K., Wang, B., et al.: Shapley counterfactual credits for multi-agent reinforcement learning. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 934–942 (2021)
Yu, C., Velu, A., Vinitsky, E., et al.: The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)
Omidshafiei, S., Pazis, J., Amato, C., et al.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: International Conference on Machine Learning, pp. 2681–2690. PMLR (2017)
Bowyer, C., Greene, D., Ward, T., et al.: Reinforcement learning for mixed cooperative/competitive dynamic spectrum access. In: 2019 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), pp. 1–6. IEEE (2019)
Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 2961–2970. PMLR (2019)
Ma, J., Lu, H., Xiao, J., et al.: Multi-robot target encirclement control with collision avoidance via deep reinforcement learning. J. Intell. Robotic Syst. 99(2), 371–386 (2020)
Touati, S., Radjef, M.S., Lakhdar, S.: A Bayesian Monte Carlo method for computing the Shapley value: application to weighted voting and bin packing games. Comput. Oper. Res. 125, 105094 (2021)
Ando, K., Takase, K.: Monte Carlo algorithm for calculating the Shapley values of minimum cost spanning tree games. J. Oper. Res. Soc. Japan 63(1), 31–40 (2020)
Castro, J., Gómez, D., Tejada, J.: Polynomial calculation of the Shapley value based on sampling. Comput. Oper. Res. 36(5), 1726–1730 (2009)
Maleki, S.: Addressing the computational issues of the Shapley value with applications in the smart grid. University of Southampton (2015)
Burgess, M.A., Chapman, A.C.: Approximating the shapley value using stratified empirical Bernstein sampling. In: International Joint Conferences on Artificial Intelligence Organization (2021)
Gnecco, G., Hadas, Y., Sanguineti, M.: Public transport transfers assessment via transferable utility games and Shapley value approximation. Transport. A Trans. Sci. 17(4), 540–565 (2021)
Illés, F., Kerényi, P.: Estimation of the Shapley value by ergodic sampling. arXiv preprint arXiv:1906.05224 (2019)
Acknowledgements
This work has been partially supported by the EU (FEDER) and Spanish MINECO grant RTI2018-094403-B-C32 funded by MCIN/AEI/10.13039/501100011033 and by “ERDF A way of making Europe”, Generalitat Valenciana under grant PROMETEO/2019/098, EU’s Horizon 2020 research and innovation programme under grant agreement No. 952215 (TAILOR), the EU (FEDER) and Spanish grant AEI/PID2021-122830OB-C42 (Sfera) and China Scholarship Council (CSC) scholarship (No. 202006290201). We thank the anonymous reviewers for their comments and interaction during the discussion process. All authors declare no competing interests.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, Y., Hernández-Orallo, J. (2023). Heterogeneity Breaks the Game: Evaluating Cooperation-Competition with Multisets of Agents. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-26412-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)