Abstract
In this paper, we consider a decision problem modeled by Markov decision processes (written as MDPs). Solving a Markov decision problem amounts to searching for a policy, in a given set, which optimizes a performance criterion. In the considered MDP problem, we address the discounted criterion with the aim of characterizing the policies which provide the best sequence of rewards. In the literature, there are three main approaches applied to solve MDPs with a discounted criterion: linear programming, value iteration and policy iteration. In this paper, we are interested in the optimization approach to the discounted MDPs. Along this line, we describe an optimization model by studying the minimization of the different norms of Optimal Bellman Residual. In general, it can be formulated as a DC (Difference of Convex functions) program for which the unified DC programming and DCA (DC Algorithms) are applied. In our works, we propose a new optimization model and a suitable DC decomposition for the model of MDPs. Numerical experiments are performed on the stationary Garnet problems. The comparative results with the linear programming method for the discounted MDPs illustrate the efficiency of our proposed approach in terms of the quality of the obtained solutions.
Change history
11 December 2019
In the original version of the book, the following belated corrections are to be incorporated.
References
Archibald, T., McKinnon, K., Thomas, L.: On the generation of markov decision processes. J. Oper. Res. Soc. 46(3), 354–361 (1995)
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
Bertsekas, D.P. (ed.): Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall Inc, Upper Saddle River (1987)
Bertsekas, D.P. (ed.): Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Bertsekas, D.P., Tsitsiklis, J.N. (eds.): Neuro-Dynamic Programming. Athena Scientific (1996)
Boutilier, C.: Knowledge representation for stochastic decision processes. In: Wooldridge, M.J., Veloso, M. (eds.) Artificial Intelligence Today, Lecture Notes in Computer Science, vol. 1600, pp. 111–152. Springer, Berlin (1999). http://dx.doi.org/10.1007/3-540-48317-9_5
Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: structural assumptions and computational leverage. J. Artif. Intell. Res. 11, 1–94 (1999)
Chan, A.B., Vasconcelos, N., Lanckriet, G.R.G.: Direct convex relaxations of sparse svm. In: Langley, P. (ed.) Proceedings of the 24th International Conference on Machine Learning, pp. 145–153. ACM, New York, NY, USA (2007)
Collobert, R., Sinz, F.H., Weston, J., Bottou, L.: Trading convexity for scalability. In: ICML. pp. 201–208 (2006)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Klein, E., Geist, M., Piot, B., Pietquin, O.: Inverse reinforcement learning through structured classification. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 25, pp. 1007–1015. Curran Associates, Inc. (2012)
Krause, N., Singer, Y.: Leveraging the margin more carefully. In: ICML’04: Proceedings of the twenty-first international conference on Machine learning. pp. 63. ACM Press, New York, NY, USA (2004)
Le Thi, H.A.: DC programming and DCA (2012). http://www.lita.univ-lorraine.fr/~lethi
Le Thi, H.A., Moeini, M.: Long-short portfolio optimization under cardinality constraints by difference of convex functions algorithm. J. Optim. Theory Appl. 161(1), 199–224 (2014)
Le Thi, H.A., Pham Dinh, T.: Solving a class of linearly constrained indefinite quadratic problems by D.C. algorithms. J. Glob. Optim. 11(3), 253–285 (1997)
Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)
Le Thi, H.A., Vo, X.T., Pham Dinh, T.: Robust Feature Selection for SVMs under Uncertain Data. In: Perner, P. (ed.) Advances in Data Mining. Applications and Theoretical Aspects, pp. 151–165. Springer, Berlin (2013)
Le Thi, H., Pham Dinh, T., Le, H., Vo, X.: Dc approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)
Le Thi, H.A., Le, H.M., Pham Dinh, T.: Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm. Mach. Learn. 101(1–3), 163–186 (2015)
Le Thi, H.A., Nguyen, M.C., Pham Dinh, T.: A dc programming approach for finding communities in networks. Neural Comput. 26(12), 2827–2854 (2014)
Le Thi, H.A., Vo, X.T., Pham Dinh, T.: Feature selection for linear SVMs under uncertain data: Robust optimization based on difference of convex functions algorithms. Neural Netw. 59, 36–50 (2014)
Le Thi, H., Nguyen, M.: Self-organizing maps by difference of convex functions optimization. Data Min. Knowl. Discov. 28(5–6), 1336–1365 (2014)
Munos, R.: Performance bounds in \(L_p\) norm for approximate value iteration. SIAM J. Control Optim. (2007)
Pham Dinh, T., Le, H.M., Le Thi, H.A., Lauer, F.: A difference of convex functions algorithm for switched linear regression. IEEE Trans. Autom. Control 59(8), 2277–2282 (2014)
Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to d.c. programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
Pham Dinh, T., Le Thi, H.A.: DC optimization algorithms for solving the trust region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
Pham Dinh, T., Le Thi, H.A.: Recent advances in DC programming and DCA. In: Nguyen, N.T., Le Thi, H.A. (eds.) Transactions on Computational Intelligence XIII, vol. 8342, pp. 1–37. Springer, Berlin (2014)
Piot, B., Geist, M., Pietquin, O.: Difference of convex functions programming for reinforcement learning. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2519–2527. Curran Associates, Inc. (2014)
Puterman, M.L. (ed.): Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Schüle, T., Schnörr, C., Weber, S., Hornegger, J.: Discrete tomography by convex-concave regularization and d.c. programming. Discret. Appl. Math. 151, 229–243 (2005)
Sigaud, O., Buffet, O. (eds.): Markov Decision Processes in Artificial Intelligence. Wiley, IEEE Press (2010)
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(L_1-L_2\) for compressed sensing. SIAM J. Sci. Comput. (to appear)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ho, V.T., Le Thi, H.A. (2016). Solving an Infinite-Horizon Discounted Markov Decision Process by DC Programming and DCA. In: Nguyen, T.B., van Do, T., An Le Thi, H., Nguyen, N.T. (eds) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol 453. Springer, Cham. https://doi.org/10.1007/978-3-319-38884-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-38884-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38883-0
Online ISBN: 978-3-319-38884-7
eBook Packages: EngineeringEngineering (R0)