Skip to main content

Finding Best k Policies

  • Conference paper
Algorithmic Decision Theory (ADT 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5783))

Included in the following conference series:

Abstract

An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k-th best policy differs from the i-th policy, for some i < k, on exactly one state. We show that the time complexity of the algorithm is quadratic in k, but the number of optimal planning problems it solves is linear in k. We demonstrate empirically that the new algorithm has good scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  2. Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assumptions and computational leverage. J. of Artificial Intelligence Research 11, 1–94 (1999)

    Article  MathSciNet  Google Scholar 

  3. Bonet, B., Geffner, H.: Planning with incomplete information as heuristic search in belief space. In: ICAPS, pp. 52–61 (2000)

    Google Scholar 

  4. Bresina, J.L., Dearden, R., Meuleau, N., Ramkrishnan, S., Smith, D.E., Washington, R.: Planning under continuous time and resource uncertainty: A challenge for AI. In: UAI, pp. 77–84 (2002)

    Google Scholar 

  5. Bresina, J.L., Jónsson, A.K., Morris, P.H., Rajan, K.: Activity planning for the mars exploration rovers. In: ICAPS, pp. 40–49 (2005)

    Google Scholar 

  6. Aberdeen, D., Thiébaux, S., Zhang, L.: Decision-theoretic military operations planning. In: ICAPS, pp. 402–412 (2004)

    Google Scholar 

  7. Musliner, D.J., Carciofini, J., Goldman, R.P., Durfee, E.H., Wu, J., Boddy, M.S.: Flexibly integrating deliberation and execution in decision-theoretic agents. In: ICAPS Workshop on Planning and Plan-Execution for Real-World Systems (2007)

    Google Scholar 

  8. Galand, L., Perny, P.: Search for compromise solutions in multiobjective state space graphs. In: ECAI, pp. 93–97 (2006)

    Google Scholar 

  9. Bryce, D., Cushing, W., Kambhampati, S.: Probabilistic planning is multiobjective! Technical Report ASU CSE TR-07-006 (June 2007)

    Google Scholar 

  10. Nielsen, L.R., Kristensen, A.R.: Finding the k best policies in finite-horizon mdps. European Journal of Operational Research 175(2), 1164–1179 (2006)

    Article  MathSciNet  Google Scholar 

  11. Nielsen, L.R., Pretolani, D., Andersen, K.A.: Finding the k shortest hyperpaths using reoptimization. Oper. Res. Lett. 34(2), 155–164 (2006)

    Article  MathSciNet  Google Scholar 

  12. Nielsen, L.R., Andersen, K.A., Pretolani, D.: Finding the k shortest hyperpaths. Computers & OR 32, 1477–1497 (2005)

    Article  MathSciNet  Google Scholar 

  13. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)

    Google Scholar 

  14. Howard, R.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)

    MATH  Google Scholar 

  15. Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York (1994)

    Book  Google Scholar 

  16. Littman, M.L., Dean, T., Kaelbling, L.P.: On the complexity of solving Markov decision problems. In: UAI, pp. 394–402 (1995)

    Google Scholar 

  17. Bonet, B.: On the speed of convergence of value iteration on stochastic shortest-path problems. Mathematics of Operations Research 32(2), 365–373 (2007)

    Article  MathSciNet  Google Scholar 

  18. Barto, A., Bradtke, S., Singh, S.: Learning to act using real-time dynamic programming. Artificial Intelligence J. 72, 81–138 (1995)

    Article  Google Scholar 

  19. Wingate, D., Seppi, K.D.: Prioritization methods for accelerating MDP solvers. JMLR 6, 851–881 (2005)

    MathSciNet  MATH  Google Scholar 

  20. Munos, R., Moore, A.: Influence and variance of a Markov chain: Application to adaptive discretization in optimal control. In: CDC (1999)

    Google Scholar 

  21. Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Mathematics of Operations Research 16(3), 580–595 (1991)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dai, P., Goldsmith, J. (2009). Finding Best k Policies. In: Rossi, F., Tsoukias, A. (eds) Algorithmic Decision Theory. ADT 2009. Lecture Notes in Computer Science(), vol 5783. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04428-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04428-1_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04427-4

  • Online ISBN: 978-3-642-04428-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics