Formal Methods Assisted Training of Safe Reinforcement Learning Agents

Murugesan, Anitha; Moghadamfalahi, Mohammad; Chattopadhyay, Arunabh

doi:10.1007/978-3-030-20652-9_22

Anitha Murugesan¹⁶,
Mohammad Moghadamfalahi¹⁶ &
Arunabh Chattopadhyay¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11460))

Included in the following conference series:

NASA Formal Methods Symposium

1193 Accesses
3 Citations

Abstract

Reinforcement learning (RL) is emerging as a powerful machine learning paradigm to develop autonomous safety critical systems; RL enables the systems to learn optimal control strategies by interacting with the environment. However, there is also widespread apprehension to deploying such systems in the real world since rigorously ensuring if they had learned safe strategies by interacting with an environment that is representative of the real world remains a challenge. Hence, there is a surge of interest to establish safety-focused RL techniques.

In this paper, we present a safety-assured training approach that augments standard RL with formal analysis and simulation technology. The benefits of coupling these techniques is three-fold: the formal analysis tools (SMT solvers) guide the system to learn strategies that rigorously uphold specified safety properties; the sophisticated simulators provide a wide-range of quantifiable, realistic learning environments; the adequacy of the safety properties can be assessed as agent explores complex environments. We illustrate this approach using a Flappy Bird game.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Flappy bird safe RL git. https://github.com/sinamf/SafeRL. Accessed 3 Aug 2019
Using keras and deep q-network to play flappy bird. https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html. Accessed 3 Aug 2019
X-plane. https://www.x-plane.com. Accessed 3 Aug 2019
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Barrett, C., Tinelli, C.: Satisfiability modulo theories. In: Clarke, E., Henzinger, T., Veith, H., Bloem, R. (eds.) Handbook of Model Checking, pp. 305–343. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-10575-8_11
Chapter Google Scholar
Berkenkamp, F., Krause, A., Schoellig, A.P.: Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics (2016). arXiv preprint: arXiv:1602.04450
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24
Chapter Google Scholar
Fulton, N., Platzer, A.: Safe reinforcement learning via formal methods: toward safe control through proof and learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
MathSciNet MATH Google Scholar
Gario, M., Micheli, A.: PySMT: a solver-agnostic library for fast prototyping of SMT-based algorithms. In: Proceedings of the 13th International Workshop on Satisfiability Modulo Theories (SMT), pp. 373–384 (2015)
Google Scholar
Hahn, E.M., Perez, M., Schewe, S., Somenzi, F., Trivedi, A., Wojtczak, D.: Omega-regular objectives in model-free reinforcement learning (2018). arXiv preprint: arXiv:1810.00950
Jansen, N., Könighofer, B., Junges, S., Bloem, R.: Shielded decision-making in MDPS (2018). arXiv preprint: arXiv:1807.06096
Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J.-P.: Safety-constrained reinforcement learning for MDPs. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 9636, pp. 130–146. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49674-9_8
Chapter Google Scholar
Jin Kim, H., Jordan, M.I., Sastry, S., Ng, A.Y.: Autonomous helicopter flight via reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 799–806 (2004)
Google Scholar
Mason, G.R., Calinescu, R.C., Kudenko, D., Banks, A.: Assured reinforcement learning for safety-critical applications. In: Doctoral Consortium at the 10th International Conference on Agents and Artificial Intelligence. SciTePress (2017)
Google Scholar
Moldovan, T.M., Abbeel, P., Jordan, M., Borrelli, F.: Safety, Risk Awareness and Exploration in Reinforcement Learning. Ph.D. thesis, University of California, Berkeley, USA (2014)
Google Scholar
Schreiter, J., Nguyen-Tuong, D., Eberts, M., Bischoff, B., Markert, H., Toussaint, M.: Safe exploration for active learning with gaussian processes. In: Bifet, A., May, M., Zadrozny, B., Gavalda, R., Pedreschi, D., Bonchi, F., Cardoso, J., Spiliopoulou, M. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 133–149. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_9
Chapter Google Scholar
Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Honeywell International Inc., Plymouth, USA
Anitha Murugesan & Mohammad Moghadamfalahi
Swift Navigation, San Francisco, USA
Arunabh Chattopadhyay

Authors

Anitha Murugesan
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Moghadamfalahi
View author publications
You can also search for this author in PubMed Google Scholar
Arunabh Chattopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anitha Murugesan .

Editor information

Editors and Affiliations

NASA, Houston, TX, USA
Julia M. Badger
Iowa State University, Ames, IA, USA
Kristin Yvonne Rozier

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Murugesan, A., Moghadamfalahi, M., Chattopadhyay, A. (2019). Formal Methods Assisted Training of Safe Reinforcement Learning Agents. In: Badger, J., Rozier, K. (eds) NASA Formal Methods. NFM 2019. Lecture Notes in Computer Science(), vol 11460. Springer, Cham. https://doi.org/10.1007/978-3-030-20652-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-20652-9_22
Published: 28 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20651-2
Online ISBN: 978-3-030-20652-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics