SOS: Safe, Optimal and Small Strategies for Hybrid Markov Decision Processes

Ashok, Pranav; Křetínský, Jan; Larsen, Kim Guldstrand; Le Coënt, Adrien; Taankvist, Jakob Haahr; Weininger, Maximilian

doi:10.1007/978-3-030-30281-8_9

Pranav Ashok¹⁰,
Jan Křetínský¹⁰,
Kim Guldstrand Larsen¹¹,
Adrien Le Coënt¹¹,
Jakob Haahr Taankvist¹¹ &
…
Maximilian Weininger¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11785))

Included in the following conference series:

International Conference on Quantitative Evaluation of Systems

672 Accesses
14 Citations

Abstract

For hybrid Markov decision processes, Stratego can compute strategies that are safe for a given safety property and (in the limit) optimal for a given cost function. Unfortunately, these strategies cannot be exported easily since they are computed as a very long list. In this paper, we demonstrate methods to learn compact representations of the strategies in the form of decision trees. These decision trees are much smaller, more understandable, and can easily be exported as code that can be loaded into embedded systems. Despite the size compression and actual differences to the original strategy, we provide guarantees on both safety and optimality of the decision-tree strategy. On the top, we show how to obtain yet smaller representations, which are still guaranteed safe, but achieve a desired trade-off between size and optimality.

This research was funded in part by TUM IGSSE project 10.06 (PARSEC), the German Research Foundation (DFG) project KR 4890/2-1 “Statistical Unbounded Verification” and the ERC Advanced Grant LASSO.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that there is a bijection between length of the run and time, as the time between each step, P, is constant.
2.
i.e. a strategy that for every configuration returns a (non-strict) subset of the actions allowed by the safe strategy.
3.
Entropy of a set X is \(H(X) = \sum _{a\in A} p_a \log _2(p_a) + (1 - p_a) \log _2(1 - p_a)\), where \(p_a\) is the fraction of samples in X belonging to class a. See [14] for more details.
4.
This is because DT learning algorithms are usually configured to avoid overfitting on the dataset.

References

Ashok, P. Křetínský, J., Larsen, K.G., Coënt, A.L., Taankvist, J.H., Weininger, M.: SOS: Safe, optimal and small strategies for hybrid Markov decision processes. Technical report (2019)
Google Scholar
Behrmann, G., Cougnard, A., David, A., Fleury, E., Larsen, K.G., Lime, D.: UPPAAL-Tiga: time for playing games!. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 121–125. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73368-3_14
Chapter Google Scholar
Bernet, J., Janin, D., Walukiewicz, I.: Permissive strategies: from parity games to safety games. ITA 36, 261–275 (2002)
MathSciNet MATH Google Scholar
Bollig, B., Wegener, I.: Improving the variable ordering of OBDDs is NP-complete. IEEE Trans. Comput. 45(9), 993–1002 (1996)
Article Google Scholar
Boutilier, C., Dean, T.L., Hanks, S.: Decision-theoretic planning: structural assumptions and computational leverage. J. Artif. Intell. Res. 11, 1–94 (1999)
Article MathSciNet Google Scholar
Boutilier, C., Dearden, R.: Approximating value trees in structured dynamic programming. In: ICML (1996)
Google Scholar
Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: IJCAI (1995)
Google Scholar
Bouyer, P., Markey, N., Olschewski, J., Ummels, M.: Measuring permissiveness in parity games: mean-payoff parity games revisited. In: Bultan, T., Hsiung, P.-A. (eds.) ATVA 2011. LNCS, vol. 6996, pp. 135–149. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24372-1_11
Chapter Google Scholar
Brázdil, T., Chatterjee, K., Chmelík, M., Fellner, A., Křetínský, J.: Counterexample explanation by learning small strategies in Markov decision processes. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 158–177. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_10
Chapter Google Scholar
Brázdil, T., Chatterjee, K., Křetínský, J., Toman, V.: Strategy representation by decision trees in reactive synthesis. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10805, pp. 385–407. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89960-2_21
Chapter Google Scholar
Breiman, L.: Classification and Regression Trees. Routledge, Abingdon (2017)
Book Google Scholar
Bryant, R.E.: Symbolic manipulation of boolean functions using a graphical representation. In: DAC (1985)
Google Scholar
Chapman, D., Kaelbling, L.P.: Input generalization in delayed reinforcement learning: an algorithm and performance comparisons. In: IJCAI. Morgan Kaufmann (1991)
Google Scholar
Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: De Raedt, L., Siebes, A. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44794-6_4
Chapter MATH Google Scholar
Coënt, A.L., Sandretto, J.A.D., Chapoutot, A., Fribourg, L.: An improved algorithm for the control synthesis of nonlinear sampled switched systems. Formal Methods Syst. Design 53(3), 363–383 (2018)
Article Google Scholar
David, A., Du, D., Larsen, K.G., Mikucionis, M., Skou, A.: An evaluation framework for energy aware buildings using statistical model checking. Sci. China Inform. Sci. 55(12), 2694–2707 (2012)
Article Google Scholar
David, A., et al.: On time with minimal expected cost!. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 129–145. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_10
Chapter Google Scholar
David, A., Jensen, P.G., Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Uppaal stratego. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 206–211. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_16
Chapter Google Scholar
de Alfaro, L., Kwiatkowska, M., Norman, G., Parker, D., Segala, R.: Symbolic model checking of probabilistic processes using MTBDDs and the kronecker representation. In: Graf, S., Schwartzbach, M. (eds.) TACAS 2000. LNCS, vol. 1785, pp. 395–410. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-46419-0_27
Chapter MATH Google Scholar
Dräger, K., Forejt, V., Kwiatkowska, M., Parker, D., Ujma, M.: Permissive controller synthesis for probabilistic systems. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 531–546. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_44
Chapter MATH Google Scholar
Esposito, F., Malerba, D., Semeraro, G.: Decision tree pruning as a search in the state space. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 165–184. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-56602-3_135
Chapter Google Scholar
Fehnker, A., Ivančić, F.: Benchmarks for hybrid systems verification. In: Alur, R., Pappas, G.J. (eds.) HSCC 2004. LNCS, vol. 2993, pp. 326–341. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24743-2_22
Chapter MATH Google Scholar
Garg, P., Löding, C., Madhusudan, P., Neider, D.: ICE: a robust framework for learning invariants. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 69–87. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_5
Chapter Google Scholar
Girard, A.: Controller synthesis for safety and reachability via approximate bisimulation. Automatica 48(5), 947–953 (2012)
Article MathSciNet Google Scholar
Girard, A.: Low-complexity quantized switching controllers using approximate bisimulation. Nonlinear Anal.: Hybrid Syst. 10, 34–44 (2013)
MathSciNet MATH Google Scholar
Girard, A., Martin, S.: Synthesis for constrained nonlinear systems using hybridization and robust controllers on simplices. IEEE Trans. Automat. Control 57(4), 1046–1051 (2012)
Article MathSciNet Google Scholar
Hahn, E.M., Norman, G., Parker, D., Wachter, B., Zhang, L.: Game-based abstraction and controller synthesis for probabilistic hybrid systems. In: QEST (2011)
Google Scholar
Hermanns, H., Kwiatkowska, M.Z., Norman, G., Parker, D., Siegle, M.: On the use of mtbdds for performability analysis and verification of stochastic systems. J. Log. Algebr. Program. 56(1–2), 23–67 (2003)
Article MathSciNet Google Scholar
Hiskens, I.A.: Stability of limit cycles in hybrid systems. In: HICSS (2001)
Google Scholar
Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: stochastic planning using decision diagrams. In: UAI (1999)
Google Scholar
Kearns, M., Koller, D.: Efficient reinforcement learning in factored MDPs. In: IJCAI (1999)
Google Scholar
Koller, D., Parr, R.: Computing factored value functions for policies in structured MDPs. In: IJCAI (1999)
Google Scholar
Kushmerick, N., Hanks, S., Weld, D.: An algorithm for probabilistic least-commitment planning. In: AAAI (1994)
Google Scholar
Larsen, K.G., Le Coënt, A., Mikučionis, M., Taankvist, J.H.: Guaranteed control synthesis for continuous systems in Uppaal Tiga. In: Chamberlain, R., Taha, W., Törngren, M. (eds.) CyPhy/WESE -2018. LNCS, vol. 11615, pp. 113–133. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23703-5_6
Chapter Google Scholar
Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Safe and optimal adaptive cruise control. In: Meyer, R., Platzer, A., Wehrheim, H. (eds.) Correct System Design. LNCS, vol. 9360, pp. 260–277. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23506-6_17
Chapter Google Scholar
Coënt, A.L., De Vuyst, F., Chamoin, L., Fribourg, L.: Control synthesis of nonlinear sampled switched systems using Euler’s method. In: SNR (2017)
Google Scholar
Liu, S., Panangadan, A., Talukder, A., Raghavendra, C.S.: Compact representation of coordinated sampling policies for body sensor networks. In: 2010 IEEE Globecom Workshops (2010)
Google Scholar
Majumdar, R., Render, E., Tabuada, P.: Robust discrete synthesis against unspecified disturbances. In: HSCC (2011)
Google Scholar
Miner, A., Parker, D.: Symbolic representations and analysis of large probabilistic systems. In: Baier, C., Haverkort, B.R., Hermanns, H., Katoen, J.-P., Siegle, M. (eds.) Validation of Stochastic Systems. LNCS, vol. 2925, pp. 296–338. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24611-4_9
Chapter MATH Google Scholar
Mingers, J.: An empirical comparison of pruning methods for decision tree induction. Mach. Learn. 4, 227–243 (1989)
Article Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, Inc., New York (1997)
MATH Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Puterman, M.L.: Markov Decision Processes. Wiley, Hoboken (1994)
Book Google Scholar
Pyeatt, L.D.: Reinforcement learning with decision trees. Appl. Inform. 26–31 (2003)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (2014)
Google Scholar
Riddle, P.J., Segal, R., Etzioni, O.: Representation design and brut-force induction in a boeingmanufacturing domain. Appl. Artif. Intell. 8, 125–147 (1994)
Article Google Scholar
Roy, P., Tabuada, P., Majumdar, R.: Pessoa 2.0: a controller synthesis tool for cyber-physical systems. In: HSCC (2011)
Google Scholar
Rudell, R.: Dynamic variable ordering for ordered binary decision diagrams. In: CAD (1993)
Google Scholar
Rungger, M., Zamani, M.: Scots: a tool for the synthesis of symbolic controllers. In: HSCC (2016)
Google Scholar
Saoud, A., Girard, A., Fribourg, L.: On the composition of discrete and continuous-time assume-guarantee contracts for invariance. In: ECC (2018)
Google Scholar
Sharma, R., Gupta, S., Hariharan, B., Aiken, A., Nori, A.V.: Verification as learning geometric concepts. In: Logozzo, F., Fähndrich, M. (eds.) SAS 2013. LNCS, vol. 7935, pp. 388–411. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38856-9_21
Chapter Google Scholar
Somenzi, F.: CUDD: CU decision diagram package-release 2.4. 2 (2009). http://vlsi.colorado.edu/~fabio/CUDD
Svoreňová, M., Křetínskỳ, J., Chmelík, M., Chatterjee, K., Černá, I., Belta, C.: Temporal logic control for stochastic linear systems using abstraction refinement of probabilistic games. Nonlinear Anal.: Hybrid Syst. 23, 230–253 (2017)
MathSciNet MATH Google Scholar
Wimmer, R., et al.: Symblicit calculation of long-run averages for concurrent probabilistic systems. In: QEST (2010)
Google Scholar
Zapreev, I.S., Verdier, C., Mazo, M.: Optimal symbolic controllers determinization for BDD storage. In: ADHS (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Pranav Ashok, Jan Křetínský & Maximilian Weininger
Aalborg University, Aalborg, Denmark
Kim Guldstrand Larsen, Adrien Le Coënt & Jakob Haahr Taankvist

Authors

Pranav Ashok
View author publications
You can also search for this author in PubMed Google Scholar
Jan Křetínský
View author publications
You can also search for this author in PubMed Google Scholar
Kim Guldstrand Larsen
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Le Coënt
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Haahr Taankvist
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Weininger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maximilian Weininger .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
David Parker
Saarland University, Saarbrücken, Germany
Verena Wolf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ashok, P., Křetínský, J., Larsen, K.G., Le Coënt, A., Taankvist, J.H., Weininger, M. (2019). SOS: Safe, Optimal and Small Strategies for Hybrid Markov Decision Processes. In: Parker, D., Wolf, V. (eds) Quantitative Evaluation of Systems. QEST 2019. Lecture Notes in Computer Science(), vol 11785. Springer, Cham. https://doi.org/10.1007/978-3-030-30281-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-30281-8_9
Published: 04 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30280-1
Online ISBN: 978-3-030-30281-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics