On Variability of Optimal Policies in Markov Decision Processes

Waldmann, Karl-Heinz

doi:10.1007/3-540-28397-8_20

Karl-Heinz Waldmann²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2304 Accesses

Abstract

Both the total reward criterion and the average reward criterion commonly used in Markov decision processes lead to an optimal policy which maximizes the associated expected value. The paper reviews these standard approaches and studies the distribution functions obtained by applying an optimal policy. In particular, an efficient extrapolation method is suggested resulting from the control of Markov decision models with an absorbing set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ALTMAN, E. (2001): Applications of Markov Decision Processes in Communication Networks: a Survey. In E. Feinberg and A. Shwartz (Eds): Markov Decision Processes, Models, Methods, Directions, and Open Problems. Kluwer, Bosten, 488–536.
Google Scholar
HINDERER, K. and WALDMANN, K.-H. (2004): Algorithms for countable state Markov decision models with an absorbing set. SIAM J. Control and Optimization, to appear.
Google Scholar
NORRIS, J.R. (1997): Markov Chains. Cambridge University Press, Cambridge.
Google Scholar
OGIWARA, T. (1995): Nonlinear Perron-Frobenius problem on an ordered Banach space. Japan J. Math., 21, 43–103.
MATH MathSciNet Google Scholar
PUTERMAN, M.L. (1994): Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York.
Google Scholar
SENNOTT, L.I. (1999): Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley, New York.
Google Scholar
WU, C. and LIN, Y. (1999): Minimizing Risk Models in Markov Decision Processes with Policies Depending on Target Values. J. Math. Anal. Appl., 231, 47–60.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Wirtschaftstheorie und Operations Research, Universität Karlsruhe (TH), D-76128, Karlsruhe, Germany
Karl-Heinz Waldmann

Authors

Karl-Heinz Waldmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Business Administration and Economics, Brandenburg University of Technology Cottbus, Konrad-Wachsmann-Allee 1, 03046, Cottbus, Germany
Daniel Baier (Chair of Marketing and Innovation Management) (Chair of Marketing and Innovation Management)
Department of Business Administration and Economics, Bielefeld University, Universitätsstr. 25, 33615, Bielefeld, Germany
Reinhold Decker (Chair of Marketing) (Chair of Marketing)
Computer Based New Media Group (CGNM), Institute for Computer Science, University of Freiburg, Georges-Köhler-Allee 51, 79110, Freiburg, Germany
Lars Schmidt-Thieme

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Waldmann, KH. (2005). On Variability of Optimal Policies in Markov Decision Processes. In: Baier, D., Decker, R., Schmidt-Thieme, L. (eds) Data Analysis and Decision Support. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28397-8_20

Download citation

DOI: https://doi.org/10.1007/3-540-28397-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26007-3
Online ISBN: 978-3-540-28397-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics