Solving an Infinite-Horizon Discounted Markov Decision Process by DC Programming and DCA

Ho, Vinh Thanh; Le Thi, Hoai An

doi:10.1007/978-3-319-38884-7_4

Vinh Thanh Ho¹⁸ &
Hoai An Le Thi¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 453))

618 Accesses
2 Citations

The original version of this chapter was revised: The missing reference has been included. The correction to this chapter is available at https://doi.org/10.1007/978-3-319-38884-7_21

Abstract

In this paper, we consider a decision problem modeled by Markov decision processes (written as MDPs). Solving a Markov decision problem amounts to searching for a policy, in a given set, which optimizes a performance criterion. In the considered MDP problem, we address the discounted criterion with the aim of characterizing the policies which provide the best sequence of rewards. In the literature, there are three main approaches applied to solve MDPs with a discounted criterion: linear programming, value iteration and policy iteration. In this paper, we are interested in the optimization approach to the discounted MDPs. Along this line, we describe an optimization model by studying the minimization of the different norms of Optimal Bellman Residual. In general, it can be formulated as a DC (Difference of Convex functions) program for which the unified DC programming and DCA (DC Algorithms) are applied. In our works, we propose a new optimization model and a suitable DC decomposition for the model of MDPs. Numerical experiments are performed on the stationary Garnet problems. The comparative results with the linear programming method for the discounted MDPs illustrate the efficiency of our proposed approach in terms of the quality of the obtained solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Change history

11 December 2019
In the original version of the book, the following belated corrections are to be incorporated.

References

Archibald, T., McKinnon, K., Thomas, L.: On the generation of markov decision processes. J. Oper. Res. Soc. 46(3), 354–361 (1995)
Article MATH Google Scholar
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Bertsekas, D.P. (ed.): Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall Inc, Upper Saddle River (1987)
MATH Google Scholar
Bertsekas, D.P. (ed.): Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N. (eds.): Neuro-Dynamic Programming. Athena Scientific (1996)
Google Scholar
Boutilier, C.: Knowledge representation for stochastic decision processes. In: Wooldridge, M.J., Veloso, M. (eds.) Artificial Intelligence Today, Lecture Notes in Computer Science, vol. 1600, pp. 111–152. Springer, Berlin (1999). http://dx.doi.org/10.1007/3-540-48317-9_5
Chapter Google Scholar
Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: structural assumptions and computational leverage. J. Artif. Intell. Res. 11, 1–94 (1999)
Article MathSciNet MATH Google Scholar
Chan, A.B., Vasconcelos, N., Lanckriet, G.R.G.: Direct convex relaxations of sparse svm. In: Langley, P. (ed.) Proceedings of the 24th International Conference on Machine Learning, pp. 145–153. ACM, New York, NY, USA (2007)
Google Scholar
Collobert, R., Sinz, F.H., Weston, J., Bottou, L.: Trading convexity for scalability. In: ICML. pp. 201–208 (2006)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Article Google Scholar
Klein, E., Geist, M., Piot, B., Pietquin, O.: Inverse reinforcement learning through structured classification. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 25, pp. 1007–1015. Curran Associates, Inc. (2012)
Google Scholar
Krause, N., Singer, Y.: Leveraging the margin more carefully. In: ICML’04: Proceedings of the twenty-first international conference on Machine learning. pp. 63. ACM Press, New York, NY, USA (2004)
Google Scholar
Le Thi, H.A.: DC programming and DCA (2012). http://www.lita.univ-lorraine.fr/~lethi
Le Thi, H.A., Moeini, M.: Long-short portfolio optimization under cardinality constraints by difference of convex functions algorithm. J. Optim. Theory Appl. 161(1), 199–224 (2014)
Article MathSciNet MATH Google Scholar
Le Thi, H.A., Pham Dinh, T.: Solving a class of linearly constrained indefinite quadratic problems by D.C. algorithms. J. Glob. Optim. 11(3), 253–285 (1997)
Article MathSciNet MATH Google Scholar
Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)
MathSciNet MATH Google Scholar
Le Thi, H.A., Vo, X.T., Pham Dinh, T.: Robust Feature Selection for SVMs under Uncertain Data. In: Perner, P. (ed.) Advances in Data Mining. Applications and Theoretical Aspects, pp. 151–165. Springer, Berlin (2013)
Google Scholar
Le Thi, H., Pham Dinh, T., Le, H., Vo, X.: Dc approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)
Article MathSciNet MATH Google Scholar
Le Thi, H.A., Le, H.M., Pham Dinh, T.: Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm. Mach. Learn. 101(1–3), 163–186 (2015)
Google Scholar
Le Thi, H.A., Nguyen, M.C., Pham Dinh, T.: A dc programming approach for finding communities in networks. Neural Comput. 26(12), 2827–2854 (2014)
Article MathSciNet MATH Google Scholar
Le Thi, H.A., Vo, X.T., Pham Dinh, T.: Feature selection for linear SVMs under uncertain data: Robust optimization based on difference of convex functions algorithms. Neural Netw. 59, 36–50 (2014)
Article MATH Google Scholar
Le Thi, H., Nguyen, M.: Self-organizing maps by difference of convex functions optimization. Data Min. Knowl. Discov. 28(5–6), 1336–1365 (2014)
Article MathSciNet MATH Google Scholar
Munos, R.: Performance bounds in \(L_p\) norm for approximate value iteration. SIAM J. Control Optim. (2007)
Google Scholar
Pham Dinh, T., Le, H.M., Le Thi, H.A., Lauer, F.: A difference of convex functions algorithm for switched linear regression. IEEE Trans. Autom. Control 59(8), 2277–2282 (2014)
Article MathSciNet MATH Google Scholar
Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to d.c. programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
MathSciNet MATH Google Scholar
Pham Dinh, T., Le Thi, H.A.: DC optimization algorithms for solving the trust region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
Article MathSciNet MATH Google Scholar
Pham Dinh, T., Le Thi, H.A.: Recent advances in DC programming and DCA. In: Nguyen, N.T., Le Thi, H.A. (eds.) Transactions on Computational Intelligence XIII, vol. 8342, pp. 1–37. Springer, Berlin (2014)
Google Scholar
Piot, B., Geist, M., Pietquin, O.: Difference of convex functions programming for reinforcement learning. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2519–2527. Curran Associates, Inc. (2014)
Google Scholar
Puterman, M.L. (ed.): Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
MATH Google Scholar
Schüle, T., Schnörr, C., Weber, S., Hornegger, J.: Discrete tomography by convex-concave regularization and d.c. programming. Discret. Appl. Math. 151, 229–243 (2005)
Article MathSciNet MATH Google Scholar
Sigaud, O., Buffet, O. (eds.): Markov Decision Processes in Artificial Intelligence. Wiley, IEEE Press (2010)
Google Scholar
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(L_1-L_2\) for compressed sensing. SIAM J. Sci. Comput. (to appear)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Theoretical and Applied Computer Science EA 3097, University of Lorraine, Ile du Saulcy, 57045, Metz, France
Vinh Thanh Ho & Hoai An Le Thi

Authors

Vinh Thanh Ho
View author publications
You can also search for this author in PubMed Google Scholar
Hoai An Le Thi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vinh Thanh Ho .

Editor information

Editors and Affiliations

International Institute for Applied Systems Analysis (IIASA), Laxenburg, Austria
Thanh Binh Nguyen
Department of Networked Systems and Services, Budapest University of Technology and Economics, Budapest, Hungary
Tien van Do
Laboratory of Theoretical and Applied Computer Science (LITA), UFR MIM, University of Lorraine, Ile du Saulcy, Metz, France
Hoai An Le Thi
Institute of Informatics, Wrocław University of Technology, Wrocław, Poland
Ngoc Thanh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ho, V.T., Le Thi, H.A. (2016). Solving an Infinite-Horizon Discounted Markov Decision Process by DC Programming and DCA. In: Nguyen, T.B., van Do, T., An Le Thi, H., Nguyen, N.T. (eds) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol 453. Springer, Cham. https://doi.org/10.1007/978-3-319-38884-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-38884-7_4
Published: 23 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38883-0
Online ISBN: 978-3-319-38884-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics