Finding Best k Policies

Dai, Peng; Goldsmith, Judy

doi:10.1007/978-3-642-04428-1_13

Peng Dai²¹ &
Judy Goldsmith²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5783))

Included in the following conference series:

International Conference on Algorithmic Decision Theory

1025 Accesses
1 Citations

Abstract

An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k > 1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k-th best policy differs from the i-th policy, for some i < k, on exactly one state. We show that the time complexity of the algorithm is quadratic in k, but the number of optimal planning problems it solves is linear in k. We demonstrate empirically that the new algorithm has good scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assumptions and computational leverage. J. of Artificial Intelligence Research 11, 1–94 (1999)
Article MathSciNet Google Scholar
Bonet, B., Geffner, H.: Planning with incomplete information as heuristic search in belief space. In: ICAPS, pp. 52–61 (2000)
Google Scholar
Bresina, J.L., Dearden, R., Meuleau, N., Ramkrishnan, S., Smith, D.E., Washington, R.: Planning under continuous time and resource uncertainty: A challenge for AI. In: UAI, pp. 77–84 (2002)
Google Scholar
Bresina, J.L., Jónsson, A.K., Morris, P.H., Rajan, K.: Activity planning for the mars exploration rovers. In: ICAPS, pp. 40–49 (2005)
Google Scholar
Aberdeen, D., Thiébaux, S., Zhang, L.: Decision-theoretic military operations planning. In: ICAPS, pp. 402–412 (2004)
Google Scholar
Musliner, D.J., Carciofini, J., Goldman, R.P., Durfee, E.H., Wu, J., Boddy, M.S.: Flexibly integrating deliberation and execution in decision-theoretic agents. In: ICAPS Workshop on Planning and Plan-Execution for Real-World Systems (2007)
Google Scholar
Galand, L., Perny, P.: Search for compromise solutions in multiobjective state space graphs. In: ECAI, pp. 93–97 (2006)
Google Scholar
Bryce, D., Cushing, W., Kambhampati, S.: Probabilistic planning is multiobjective! Technical Report ASU CSE TR-07-006 (June 2007)
Google Scholar
Nielsen, L.R., Kristensen, A.R.: Finding the k best policies in finite-horizon mdps. European Journal of Operational Research 175(2), 1164–1179 (2006)
Article MathSciNet Google Scholar
Nielsen, L.R., Pretolani, D., Andersen, K.A.: Finding the k shortest hyperpaths using reoptimization. Oper. Res. Lett. 34(2), 155–164 (2006)
Article MathSciNet Google Scholar
Nielsen, L.R., Andersen, K.A., Pretolani, D.: Finding the k shortest hyperpaths. Computers & OR 32, 1477–1497 (2005)
Article MathSciNet Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)
Google Scholar
Howard, R.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)
MATH Google Scholar
Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York (1994)
Book Google Scholar
Littman, M.L., Dean, T., Kaelbling, L.P.: On the complexity of solving Markov decision problems. In: UAI, pp. 394–402 (1995)
Google Scholar
Bonet, B.: On the speed of convergence of value iteration on stochastic shortest-path problems. Mathematics of Operations Research 32(2), 365–373 (2007)
Article MathSciNet Google Scholar
Barto, A., Bradtke, S., Singh, S.: Learning to act using real-time dynamic programming. Artificial Intelligence J. 72, 81–138 (1995)
Article Google Scholar
Wingate, D., Seppi, K.D.: Prioritization methods for accelerating MDP solvers. JMLR 6, 851–881 (2005)
MathSciNet MATH Google Scholar
Munos, R., Moore, A.: Influence and variance of a Markov chain: Application to adaptive discretization in optimal control. In: CDC (1999)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: An analysis of stochastic shortest path problems. Mathematics of Operations Research 16(3), 580–595 (1991)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science & Engineering University of Washington, Seattle, WA, US, 98195-2350
Peng Dai
Dept. of Comp. Sci. Lexington, Univ. of Kentucky, KY, USA, 40506-0046
Judy Goldsmith

Authors

Peng Dai
View author publications
You can also search for this author in PubMed Google Scholar
Judy Goldsmith
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Pure and Applied Mathematics, University of Padova, Via Trieste 63, 35121, Padova, Italy
Francesca Rossi
LAMSADE-CNRS, University of Paris Dauphine, Place du Maréchal De Lattre de Tassigny, 75775, Paris Cedex 16, France
Alexis Tsoukias

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dai, P., Goldsmith, J. (2009). Finding Best k Policies. In: Rossi, F., Tsoukias, A. (eds) Algorithmic Decision Theory. ADT 2009. Lecture Notes in Computer Science(), vol 5783. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04428-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-04428-1_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04427-4
Online ISBN: 978-3-642-04428-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics