Skip to main content

SOS: Safe, Optimal and Small Strategies for Hybrid Markov Decision Processes

  • Conference paper
  • First Online:
Quantitative Evaluation of Systems (QEST 2019)

Abstract

For hybrid Markov decision processes, Stratego can compute strategies that are safe for a given safety property and (in the limit) optimal for a given cost function. Unfortunately, these strategies cannot be exported easily since they are computed as a very long list. In this paper, we demonstrate methods to learn compact representations of the strategies in the form of decision trees. These decision trees are much smaller, more understandable, and can easily be exported as code that can be loaded into embedded systems. Despite the size compression and actual differences to the original strategy, we provide guarantees on both safety and optimality of the decision-tree strategy. On the top, we show how to obtain yet smaller representations, which are still guaranteed safe, but achieve a desired trade-off between size and optimality.

This research was funded in part by TUM IGSSE project 10.06 (PARSEC), the German Research Foundation (DFG) project KR 4890/2-1 “Statistical Unbounded Verification” and the ERC Advanced Grant LASSO.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that there is a bijection between length of the run and time, as the time between each step, P, is constant.

  2. 2.

    i.e. a strategy that for every configuration returns a (non-strict) subset of the actions allowed by the safe strategy.

  3. 3.

    Entropy of a set X is \(H(X) = \sum _{a\in A} p_a \log _2(p_a) + (1 - p_a) \log _2(1 - p_a)\), where \(p_a\) is the fraction of samples in X belonging to class a. See [14] for more details.

  4. 4.

    This is because DT learning algorithms are usually configured to avoid overfitting on the dataset.

References

  1. Ashok, P. Křetínský, J., Larsen, K.G., Coënt, A.L., Taankvist, J.H., Weininger, M.: SOS: Safe, optimal and small strategies for hybrid Markov decision processes. Technical report (2019)

    Google Scholar 

  2. Behrmann, G., Cougnard, A., David, A., Fleury, E., Larsen, K.G., Lime, D.: UPPAAL-Tiga: time for playing games!. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 121–125. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73368-3_14

    Chapter  Google Scholar 

  3. Bernet, J., Janin, D., Walukiewicz, I.: Permissive strategies: from parity games to safety games. ITA 36, 261–275 (2002)

    MathSciNet  MATH  Google Scholar 

  4. Bollig, B., Wegener, I.: Improving the variable ordering of OBDDs is NP-complete. IEEE Trans. Comput. 45(9), 993–1002 (1996)

    Article  Google Scholar 

  5. Boutilier, C., Dean, T.L., Hanks, S.: Decision-theoretic planning: structural assumptions and computational leverage. J. Artif. Intell. Res. 11, 1–94 (1999)

    Article  MathSciNet  Google Scholar 

  6. Boutilier, C., Dearden, R.: Approximating value trees in structured dynamic programming. In: ICML (1996)

    Google Scholar 

  7. Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: IJCAI (1995)

    Google Scholar 

  8. Bouyer, P., Markey, N., Olschewski, J., Ummels, M.: Measuring permissiveness in parity games: mean-payoff parity games revisited. In: Bultan, T., Hsiung, P.-A. (eds.) ATVA 2011. LNCS, vol. 6996, pp. 135–149. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24372-1_11

    Chapter  Google Scholar 

  9. Brázdil, T., Chatterjee, K., Chmelík, M., Fellner, A., Křetínský, J.: Counterexample explanation by learning small strategies in Markov decision processes. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 158–177. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_10

    Chapter  Google Scholar 

  10. Brázdil, T., Chatterjee, K., Křetínský, J., Toman, V.: Strategy representation by decision trees in reactive synthesis. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10805, pp. 385–407. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-89960-2_21

    Chapter  Google Scholar 

  11. Breiman, L.: Classification and Regression Trees. Routledge, Abingdon (2017)

    Book  Google Scholar 

  12. Bryant, R.E.: Symbolic manipulation of boolean functions using a graphical representation. In: DAC (1985)

    Google Scholar 

  13. Chapman, D., Kaelbling, L.P.: Input generalization in delayed reinforcement learning: an algorithm and performance comparisons. In: IJCAI. Morgan Kaufmann (1991)

    Google Scholar 

  14. Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: De Raedt, L., Siebes, A. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44794-6_4

    Chapter  MATH  Google Scholar 

  15. Coënt, A.L., Sandretto, J.A.D., Chapoutot, A., Fribourg, L.: An improved algorithm for the control synthesis of nonlinear sampled switched systems. Formal Methods Syst. Design 53(3), 363–383 (2018)

    Article  Google Scholar 

  16. David, A., Du, D., Larsen, K.G., Mikucionis, M., Skou, A.: An evaluation framework for energy aware buildings using statistical model checking. Sci. China Inform. Sci. 55(12), 2694–2707 (2012)

    Article  Google Scholar 

  17. David, A., et al.: On time with minimal expected cost!. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 129–145. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_10

    Chapter  Google Scholar 

  18. David, A., Jensen, P.G., Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Uppaal stratego. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 206–211. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_16

    Chapter  Google Scholar 

  19. de Alfaro, L., Kwiatkowska, M., Norman, G., Parker, D., Segala, R.: Symbolic model checking of probabilistic processes using MTBDDs and the kronecker representation. In: Graf, S., Schwartzbach, M. (eds.) TACAS 2000. LNCS, vol. 1785, pp. 395–410. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-46419-0_27

    Chapter  MATH  Google Scholar 

  20. Dräger, K., Forejt, V., Kwiatkowska, M., Parker, D., Ujma, M.: Permissive controller synthesis for probabilistic systems. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 531–546. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_44

    Chapter  MATH  Google Scholar 

  21. Esposito, F., Malerba, D., Semeraro, G.: Decision tree pruning as a search in the state space. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 165–184. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-56602-3_135

    Chapter  Google Scholar 

  22. Fehnker, A., Ivančić, F.: Benchmarks for hybrid systems verification. In: Alur, R., Pappas, G.J. (eds.) HSCC 2004. LNCS, vol. 2993, pp. 326–341. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24743-2_22

    Chapter  MATH  Google Scholar 

  23. Garg, P., Löding, C., Madhusudan, P., Neider, D.: ICE: a robust framework for learning invariants. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 69–87. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_5

    Chapter  Google Scholar 

  24. Girard, A.: Controller synthesis for safety and reachability via approximate bisimulation. Automatica 48(5), 947–953 (2012)

    Article  MathSciNet  Google Scholar 

  25. Girard, A.: Low-complexity quantized switching controllers using approximate bisimulation. Nonlinear Anal.: Hybrid Syst. 10, 34–44 (2013)

    MathSciNet  MATH  Google Scholar 

  26. Girard, A., Martin, S.: Synthesis for constrained nonlinear systems using hybridization and robust controllers on simplices. IEEE Trans. Automat. Control 57(4), 1046–1051 (2012)

    Article  MathSciNet  Google Scholar 

  27. Hahn, E.M., Norman, G., Parker, D., Wachter, B., Zhang, L.: Game-based abstraction and controller synthesis for probabilistic hybrid systems. In: QEST (2011)

    Google Scholar 

  28. Hermanns, H., Kwiatkowska, M.Z., Norman, G., Parker, D., Siegle, M.: On the use of mtbdds for performability analysis and verification of stochastic systems. J. Log. Algebr. Program. 56(1–2), 23–67 (2003)

    Article  MathSciNet  Google Scholar 

  29. Hiskens, I.A.: Stability of limit cycles in hybrid systems. In: HICSS (2001)

    Google Scholar 

  30. Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: stochastic planning using decision diagrams. In: UAI (1999)

    Google Scholar 

  31. Kearns, M., Koller, D.: Efficient reinforcement learning in factored MDPs. In: IJCAI (1999)

    Google Scholar 

  32. Koller, D., Parr, R.: Computing factored value functions for policies in structured MDPs. In: IJCAI (1999)

    Google Scholar 

  33. Kushmerick, N., Hanks, S., Weld, D.: An algorithm for probabilistic least-commitment planning. In: AAAI (1994)

    Google Scholar 

  34. Larsen, K.G., Le Coënt, A., Mikučionis, M., Taankvist, J.H.: Guaranteed control synthesis for continuous systems in Uppaal Tiga. In: Chamberlain, R., Taha, W., Törngren, M. (eds.) CyPhy/WESE -2018. LNCS, vol. 11615, pp. 113–133. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23703-5_6

    Chapter  Google Scholar 

  35. Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Safe and optimal adaptive cruise control. In: Meyer, R., Platzer, A., Wehrheim, H. (eds.) Correct System Design. LNCS, vol. 9360, pp. 260–277. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23506-6_17

    Chapter  Google Scholar 

  36. Coënt, A.L., De Vuyst, F., Chamoin, L., Fribourg, L.: Control synthesis of nonlinear sampled switched systems using Euler’s method. In: SNR (2017)

    Google Scholar 

  37. Liu, S., Panangadan, A., Talukder, A., Raghavendra, C.S.: Compact representation of coordinated sampling policies for body sensor networks. In: 2010 IEEE Globecom Workshops (2010)

    Google Scholar 

  38. Majumdar, R., Render, E., Tabuada, P.: Robust discrete synthesis against unspecified disturbances. In: HSCC (2011)

    Google Scholar 

  39. Miner, A., Parker, D.: Symbolic representations and analysis of large probabilistic systems. In: Baier, C., Haverkort, B.R., Hermanns, H., Katoen, J.-P., Siegle, M. (eds.) Validation of Stochastic Systems. LNCS, vol. 2925, pp. 296–338. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24611-4_9

    Chapter  MATH  Google Scholar 

  40. Mingers, J.: An empirical comparison of pruning methods for decision tree induction. Mach. Learn. 4, 227–243 (1989)

    Article  Google Scholar 

  41. Mitchell, T.M.: Machine Learning. McGraw-Hill, Inc., New York (1997)

    MATH  Google Scholar 

  42. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  43. Puterman, M.L.: Markov Decision Processes. Wiley, Hoboken (1994)

    Book  Google Scholar 

  44. Pyeatt, L.D.: Reinforcement learning with decision trees. Appl. Inform. 26–31 (2003)

    Google Scholar 

  45. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)

    Google Scholar 

  46. Quinlan, J.R.: C4.5: Programs for Machine Learning. Elsevier, Amsterdam (2014)

    Google Scholar 

  47. Riddle, P.J., Segal, R., Etzioni, O.: Representation design and brut-force induction in a boeingmanufacturing domain. Appl. Artif. Intell. 8, 125–147 (1994)

    Article  Google Scholar 

  48. Roy, P., Tabuada, P., Majumdar, R.: Pessoa 2.0: a controller synthesis tool for cyber-physical systems. In: HSCC (2011)

    Google Scholar 

  49. Rudell, R.: Dynamic variable ordering for ordered binary decision diagrams. In: CAD (1993)

    Google Scholar 

  50. Rungger, M., Zamani, M.: Scots: a tool for the synthesis of symbolic controllers. In: HSCC (2016)

    Google Scholar 

  51. Saoud, A., Girard, A., Fribourg, L.: On the composition of discrete and continuous-time assume-guarantee contracts for invariance. In: ECC (2018)

    Google Scholar 

  52. Sharma, R., Gupta, S., Hariharan, B., Aiken, A., Nori, A.V.: Verification as learning geometric concepts. In: Logozzo, F., Fähndrich, M. (eds.) SAS 2013. LNCS, vol. 7935, pp. 388–411. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38856-9_21

    Chapter  Google Scholar 

  53. Somenzi, F.: CUDD: CU decision diagram package-release 2.4. 2 (2009). http://vlsi.colorado.edu/~fabio/CUDD

  54. Svoreňová, M., Křetínskỳ, J., Chmelík, M., Chatterjee, K., Černá, I., Belta, C.: Temporal logic control for stochastic linear systems using abstraction refinement of probabilistic games. Nonlinear Anal.: Hybrid Syst. 23, 230–253 (2017)

    MathSciNet  MATH  Google Scholar 

  55. Wimmer, R., et al.: Symblicit calculation of long-run averages for concurrent probabilistic systems. In: QEST (2010)

    Google Scholar 

  56. Zapreev, I.S., Verdier, C., Mazo, M.: Optimal symbolic controllers determinization for BDD storage. In: ADHS (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maximilian Weininger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ashok, P., Křetínský, J., Larsen, K.G., Le Coënt, A., Taankvist, J.H., Weininger, M. (2019). SOS: Safe, Optimal and Small Strategies for Hybrid Markov Decision Processes. In: Parker, D., Wolf, V. (eds) Quantitative Evaluation of Systems. QEST 2019. Lecture Notes in Computer Science(), vol 11785. Springer, Cham. https://doi.org/10.1007/978-3-030-30281-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30281-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30280-1

  • Online ISBN: 978-3-030-30281-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics