Abstract
Reinforcement learning (RL) is emerging as a powerful machine learning paradigm to develop autonomous safety critical systems; RL enables the systems to learn optimal control strategies by interacting with the environment. However, there is also widespread apprehension to deploying such systems in the real world since rigorously ensuring if they had learned safe strategies by interacting with an environment that is representative of the real world remains a challenge. Hence, there is a surge of interest to establish safety-focused RL techniques.
In this paper, we present a safety-assured training approach that augments standard RL with formal analysis and simulation technology. The benefits of coupling these techniques is three-fold: the formal analysis tools (SMT solvers) guide the system to learn strategies that rigorously uphold specified safety properties; the sophisticated simulators provide a wide-range of quantifiable, realistic learning environments; the adequacy of the safety properties can be assessed as agent explores complex environments. We illustrate this approach using a Flappy Bird game.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Flappy bird safe RL git. https://github.com/sinamf/SafeRL. Accessed 3 Aug 2019
Using keras and deep q-network to play flappy bird. https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html. Accessed 3 Aug 2019
X-plane. https://www.x-plane.com. Accessed 3 Aug 2019
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Barrett, C., Tinelli, C.: Satisfiability modulo theories. In: Clarke, E., Henzinger, T., Veith, H., Bloem, R. (eds.) Handbook of Model Checking, pp. 305–343. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-10575-8_11
Berkenkamp, F., Krause, A., Schoellig, A.P.: Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics (2016). arXiv preprint: arXiv:1602.04450
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24
Fulton, N., Platzer, A.: Safe reinforcement learning via formal methods: toward safe control through proof and learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
Gario, M., Micheli, A.: PySMT: a solver-agnostic library for fast prototyping of SMT-based algorithms. In: Proceedings of the 13th International Workshop on Satisfiability Modulo Theories (SMT), pp. 373–384 (2015)
Hahn, E.M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., Wojtczak, D.: Omega-regular objectives in model-free reinforcement learning (2018). arXiv preprint: arXiv:1810.00950
Jansen, N., Könighofer, B., Junges, S., Bloem, R.: Shielded decision-making in MDPS (2018). arXiv preprint: arXiv:1807.06096
Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.-P.: Safety-constrained reinforcement learning for MDPs. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 130–146. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_8
Jin Kim, H., Jordan, M.I., Sastry, S., Ng, A.Y.: Autonomous helicopter flight via reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 799–806 (2004)
Mason, G.R., Calinescu, R.C., Kudenko, D., Banks, A.: Assured reinforcement learning for safety-critical applications. In: Doctoral Consortium at the 10th International Conference on Agents and Artificial Intelligence. SciTePress (2017)
Moldovan, T.M., Abbeel, P., Jordan, M., Borrelli, F.: Safety, Risk Awareness and Exploration in Reinforcement Learning. Ph.D. thesis, University of California, Berkeley, USA (2014)
Schreiter, J., Nguyen-Tuong, D., Eberts, M., Bischoff, B., Markert, H., Toussaint, M.: Safe exploration for active learning with gaussian processes. In: Bifet, A., May, M., Zadrozny, B., Gavalda, R., Pedreschi, D., Bonchi, F., Cardoso, J., Spiliopoulou, M. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 133–149. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_9
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Murugesan, A., Moghadamfalahi, M., Chattopadhyay, A. (2019). Formal Methods Assisted Training of Safe Reinforcement Learning Agents. In: Badger, J., Rozier, K. (eds) NASA Formal Methods. NFM 2019. Lecture Notes in Computer Science(), vol 11460. Springer, Cham. https://doi.org/10.1007/978-3-030-20652-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-20652-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20651-2
Online ISBN: 978-3-030-20652-9
eBook Packages: Computer ScienceComputer Science (R0)