Abstract
Using multi-agent reinforcement learning to find solutions to complex decision-making problems in shared environments has become standard practice in many scenarios. However, this is not the case in safety-critical scenarios, where the reinforcement learning process, which uses stochastic mechanisms, could lead to highly unsafe outcomes. We proposed a novel, safe multi-agent reinforcement learning approach named Assured Multi-Agent Reinforcement Learning (AMARL) to address this issue. Distinct from other safe multi-agent reinforcement learning approaches, AMARL utilises quantitative verification, a model checking technique that guarantees agent compliance of safety, performance, and non-functional requirements, both during and after the learning process. We have previously evaluated AMARL in patrolling domains with various multi-agent reinforcement learning algorithms for both homogeneous and heterogeneous systems. In this work we extend AMARL through the use of deep multi-agent reinforcement learning. This approach is particularly appropriate for systems in which the rewards are sparse and hence extends the applicability of AMARL. We evaluate our approach within a new search and collection domain which demonstrates promising results in safety standards and performance compared to algorithms not using AMARL.
Supported by the Defence Science and Technology Laboratory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Brunke, L., et al.: Safe learning in robotics: from learning-based control to safe reinforcement learning. arXiv preprint arXiv:2108.06266 (2021)
Bui, V.H., Nguyen, T.T., Kim, H.M.: Distributed operation of wind farm for maximizing output power: a multi-agent deep reinforcement learning approach. IEEE Access 8, 173136–173146 (2020)
Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: an overview. In: Srinivasan, D., Jain, L.C. (eds.) Innovations in Multi-Agent Systems and Applications - 1. Studies in Computational Intelligence, vol. 310, pp. 183–221. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14435-6_7
Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3387–3395 (2019)
Danassis, P., Filos-Ratsikas, A., Faltings, B.: Achieving diverse objectives with AI-driven prices in deep reinforcement learning multi-agent markets. arXiv preprint arXiv:2106.06060 (2021)
Dehnert, C., Junges, S., Katoen, J.-P., Volk, M.: A storm is coming: a modern probabilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 592–600. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_31
Fan, J., Wang, Z., Xie, Y., Yang, Z.: A theoretical analysis of deep q-learning. In: Learning for Dynamics and Control, pp. 486–489. PMLR (2020)
Faria, J.M.: Machine learning safety: an overview. In: Proceedings of the 26th Safety-Critical Systems Symposium, York, UK, pp. 6–8 (2018)
Garcia, F., Rachelson, E.: Markov decision processes. Markov Decision Processes in Artificial Intelligence, pp. 1–38 (2013)
Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
Ge, Y., Zhu, F., Huang, W., Zhao, P., Liu, Q.: Multi-agent cooperation q-learning algorithm based on constrained Markov game. Comput. Sci. Inf. Syst. 17(2), 647–664 (2020)
Gerasimou, S., Calinescu, R., Shevtsov, S., Weyns, D.: UNDERSEA: an exemplar for engineering self-adaptive unmanned underwater vehicles. In: 2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), pp. 83–89. IEEE (2017)
Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects Comput. 6(5), 512–535 (1994)
Hasanbeig, M., Abate, A., Kroening, D.: Cautious reinforcement learning with logical constraints. arXiv preprint arXiv:2002.12156 (2020)
Hernandez-Leal, P., Kartal, B., Taylor, M.E.: Is multiagent deep reinforcement learning the answer or the question? A brief survey. Learning 21, 22 (2018)
Huang, Y., Wu, S., Mu, Z., Long, X., Chu, S., Zhao, G.: A multi-agent reinforcement learning method for swarm robots in space collaborative exploration. In: 2020 6th International Conference on Control, Automation and Robotics (ICCAR), pp. 139–144. IEEE (2020)
Huh, S., Yang, I.: Safe reinforcement learning for probabilistic reachability and safety specifications: a Lyapunov-based approach. arXiv preprint arXiv:2002.10126 (2020)
Jansen, N., Könighofer, B., Junges, S., Serban, A., Bloem, R.: Safe reinforcement learning using probabilistic shields (2020)
Juliani, A., et al.: Unity: a general platform for intelligent agents. arXiv preprint arXiv:1809.02627 (2018)
Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.-P.: Safety-constrained reinforcement learning for MDPs. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 130–146. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_8
Kwiatkowska, M., Norman, G., Parker, D.: Probabilistic symbolic model checking with PRISM: a hybrid approach. In: Katoen, J.-P., Stevens, P. (eds.) TACAS 2002. LNCS, vol. 2280, pp. 52–66. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-46002-0_5
Kwiatkowska, M., Norman, G., Parker, D.: Stochastic model checking. In: Bernardo, M., Hillston, J. (eds.) SFM 2007. LNCS, vol. 4486, pp. 220–270. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72522-0_6
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Lee, H.R., Lee, T.: Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response. Eur. J. Oper. Res. 291(1), 296–308 (2021)
Liao, X., et al.: Iteratively-refined interactive 3D medical image segmentation with multi-agent reinforcement learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9394–9402 (2020)
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)
Luis, S.Y., Reina, D.G., Marín, S.L.T.: A multiagent deep reinforcement learning approach for path planning in autonomous surface vehicles: the Ypacaraí lake patrolling case. IEEE Access 9, 17084–17099 (2021)
Mason, G.R., Calinescu, R.C., Kudenko, D., Banks, A.: Assured reinforcement learning with formally verified abstract policies. In: 9th International Conference on Agents and Artificial Intelligence (ICAART), York (2017)
Nowé, A., Vrancx, P., De Hauwere, Y.M.: Game theory and multi-agent reinforcement learning. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. Adaptation, Learning, and Optimization, vol. 12, pp. 441–470. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-27645-3_14
OroojlooyJadid, A., Hajinezhad, D.: A review of cooperative multi-agent deep reinforcement learning. arXiv preprint arXiv:1908.03963 (2019)
Pardalos, P.M., Migdalas, A., Pitsoulis, L.: Pareto Optimality, Game Theory and Equilibria, vol. 17. Springer, Heidelberg (2008). https://doi.org/10.1007/978-0-387-77247-9
Parnika, P., Diddigi, R.B., Danda, S.K.R., Bhatnagar, S.: Attention actor-critic algorithm for multi-agent constrained co-operative reinforcement learning. arXiv preprint arXiv:2101.02349 (2021)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Portugal, D., Iocchi, L., Farinelli, A.: A ROS-based framework for simulation and benchmarking of multi-robot patrolling algorithms. In: Koubaa, A. (ed.) Robot Operating System (ROS). SCI, vol. 778, pp. 3–28. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91590-6_1
Riley, J., Calinescu, R., Paterson, C., Kudenko, D., Banks, A.: Reinforcement learning with quantitative verification for assured multi-agent policies. In: 13th International Conference on Agents and Artificial Intelligence, York (2021)
Riley, J., Calinescu, R., Paterson, C., Kudenko, D., Banks, A.: Utilising assured multi-agent reinforcement learning within safety-critical scenarios. Procedia Comput. Sci. 192, 1061–1070 (2021). Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 25th International Conference KES 2021
Rizk, Y., Awad, M., Tunstel, E.W.: Decision making in multiagent systems: a survey. IEEE Trans. Cogn. Dev. Syst. 10(3), 514–529 (2018)
Rosser, C., Abed, K.: Curiosity-driven reinforced learning of undesired actions in autonomous intelligent agents. In: 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 000039–000042. IEEE (2021)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Spano, S., et al.: An efficient hardware implementation of reinforcement learning: the q-learning algorithm. IEEE Access 7, 186340–186351 (2019)
Srinivasan, K., Eysenbach, B., Ha, S., Tan, J., Finn, C.: Learning to be safe: deep rl with a safety critic. arXiv preprint arXiv:2010.14603 (2020)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Thananjeyan, B., et al.: Recovery RL: safe reinforcement learning with learned recovery zones. IEEE Robot. Autom. Lett. 6(3), 4915–4922 (2021)
Wachi, A., Sui, Y.: Safe reinforcement learning in constrained Markov decision processes. In: International Conference on Machine Learning, pp. 9797–9806. PMLR (2020)
Wiering, M.A., Van Otterlo, M.: Reinforcement learning. Adapt. Learn. Optim. 12(3), 729 (2012)
Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds.) Handbook of Reinforcement Learning and Control. SSDC, vol. 325, pp. 321–384. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-60990-0_12
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Riley, J., Calinescu, R., Paterson, C., Kudenko, D., Banks, A. (2022). Assured Deep Multi-Agent Reinforcement Learning for Safe Robotic Systems. In: Rocha, A.P., Steels, L., van den Herik, J. (eds) Agents and Artificial Intelligence. ICAART 2021. Lecture Notes in Computer Science(), vol 13251. Springer, Cham. https://doi.org/10.1007/978-3-031-10161-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-10161-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10160-1
Online ISBN: 978-3-031-10161-8
eBook Packages: Computer ScienceComputer Science (R0)