Skip to main content

Solving an Infinite-Horizon Discounted Markov Decision Process by DC Programming and DCA

  • Conference paper
  • First Online:
Advanced Computational Methods for Knowledge Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 453))

Abstract

In this paper, we consider a decision problem modeled by Markov decision processes (written as MDPs). Solving a Markov decision problem amounts to searching for a policy, in a given set, which optimizes a performance criterion. In the considered MDP problem, we address the discounted criterion with the aim of characterizing the policies which provide the best sequence of rewards. In the literature, there are three main approaches applied to solve MDPs with a discounted criterion: linear programming, value iteration and policy iteration. In this paper, we are interested in the optimization approach to the discounted MDPs. Along this line, we describe an optimization model by studying the minimization of the different norms of Optimal Bellman Residual. In general, it can be formulated as a DC (Difference of Convex functions) program for which the unified DC programming and DCA (DC Algorithms) are applied. In our works, we propose a new optimization model and a suitable DC decomposition for the model of MDPs. Numerical experiments are performed on the stationary Garnet problems. The comparative results with the linear programming method for the discounted MDPs illustrate the efficiency of our proposed approach in terms of the quality of the obtained solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Change history

  • 11 December 2019

    In the original version of the book, the following belated corrections are to be incorporated.

References

  1. Archibald, T., McKinnon, K., Thomas, L.: On the generation of markov decision processes. J. Oper. Res. Soc. 46(3), 354–361 (1995)

    Article  MATH  Google Scholar 

  2. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  3. Bertsekas, D.P. (ed.): Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall Inc, Upper Saddle River (1987)

    MATH  Google Scholar 

  4. Bertsekas, D.P. (ed.): Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)

    Google Scholar 

  5. Bertsekas, D.P., Tsitsiklis, J.N. (eds.): Neuro-Dynamic Programming. Athena Scientific (1996)

    Google Scholar 

  6. Boutilier, C.: Knowledge representation for stochastic decision processes. In: Wooldridge, M.J., Veloso, M. (eds.) Artificial Intelligence Today, Lecture Notes in Computer Science, vol. 1600, pp. 111–152. Springer, Berlin (1999). http://dx.doi.org/10.1007/3-540-48317-9_5

    Chapter  Google Scholar 

  7. Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: structural assumptions and computational leverage. J. Artif. Intell. Res. 11, 1–94 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chan, A.B., Vasconcelos, N., Lanckriet, G.R.G.: Direct convex relaxations of sparse svm. In: Langley, P. (ed.) Proceedings of the 24th International Conference on Machine Learning, pp. 145–153. ACM, New York, NY, USA (2007)

    Google Scholar 

  9. Collobert, R., Sinz, F.H., Weston, J., Bottou, L.: Trading convexity for scalability. In: ICML. pp. 201–208 (2006)

    Google Scholar 

  10. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)

    Article  Google Scholar 

  11. Klein, E., Geist, M., Piot, B., Pietquin, O.: Inverse reinforcement learning through structured classification. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 25, pp. 1007–1015. Curran Associates, Inc. (2012)

    Google Scholar 

  12. Krause, N., Singer, Y.: Leveraging the margin more carefully. In: ICML’04: Proceedings of the twenty-first international conference on Machine learning. pp. 63. ACM Press, New York, NY, USA (2004)

    Google Scholar 

  13. Le Thi, H.A.: DC programming and DCA (2012). http://www.lita.univ-lorraine.fr/~lethi

  14. Le Thi, H.A., Moeini, M.: Long-short portfolio optimization under cardinality constraints by difference of convex functions algorithm. J. Optim. Theory Appl. 161(1), 199–224 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  15. Le Thi, H.A., Pham Dinh, T.: Solving a class of linearly constrained indefinite quadratic problems by D.C. algorithms. J. Glob. Optim. 11(3), 253–285 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  16. Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)

    MathSciNet  MATH  Google Scholar 

  17. Le Thi, H.A., Vo, X.T., Pham Dinh, T.: Robust Feature Selection for SVMs under Uncertain Data. In: Perner, P. (ed.) Advances in Data Mining. Applications and Theoretical Aspects, pp. 151–165. Springer, Berlin (2013)

    Google Scholar 

  18. Le Thi, H., Pham Dinh, T., Le, H., Vo, X.: Dc approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  19. Le Thi, H.A., Le, H.M., Pham Dinh, T.: Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm. Mach. Learn. 101(1–3), 163–186 (2015)

    Google Scholar 

  20. Le Thi, H.A., Nguyen, M.C., Pham Dinh, T.: A dc programming approach for finding communities in networks. Neural Comput. 26(12), 2827–2854 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  21. Le Thi, H.A., Vo, X.T., Pham Dinh, T.: Feature selection for linear SVMs under uncertain data: Robust optimization based on difference of convex functions algorithms. Neural Netw. 59, 36–50 (2014)

    Article  MATH  Google Scholar 

  22. Le Thi, H., Nguyen, M.: Self-organizing maps by difference of convex functions optimization. Data Min. Knowl. Discov. 28(5–6), 1336–1365 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  23. Munos, R.: Performance bounds in \(L_p\) norm for approximate value iteration. SIAM J. Control Optim. (2007)

    Google Scholar 

  24. Pham Dinh, T., Le, H.M., Le Thi, H.A., Lauer, F.: A difference of convex functions algorithm for switched linear regression. IEEE Trans. Autom. Control 59(8), 2277–2282 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  25. Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to d.c. programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)

    MathSciNet  MATH  Google Scholar 

  26. Pham Dinh, T., Le Thi, H.A.: DC optimization algorithms for solving the trust region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  27. Pham Dinh, T., Le Thi, H.A.: Recent advances in DC programming and DCA. In: Nguyen, N.T., Le Thi, H.A. (eds.) Transactions on Computational Intelligence XIII, vol. 8342, pp. 1–37. Springer, Berlin (2014)

    Google Scholar 

  28. Piot, B., Geist, M., Pietquin, O.: Difference of convex functions programming for reinforcement learning. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2519–2527. Curran Associates, Inc. (2014)

    Google Scholar 

  29. Puterman, M.L. (ed.): Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)

    MATH  Google Scholar 

  30. Schüle, T., Schnörr, C., Weber, S., Hornegger, J.: Discrete tomography by convex-concave regularization and d.c. programming. Discret. Appl. Math. 151, 229–243 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  31. Sigaud, O., Buffet, O. (eds.): Markov Decision Processes in Artificial Intelligence. Wiley, IEEE Press (2010)

    Google Scholar 

  32. Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(L_1-L_2\) for compressed sensing. SIAM J. Sci. Comput. (to appear)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vinh Thanh Ho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ho, V.T., Le Thi, H.A. (2016). Solving an Infinite-Horizon Discounted Markov Decision Process by DC Programming and DCA. In: Nguyen, T.B., van Do, T., An Le Thi, H., Nguyen, N.T. (eds) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol 453. Springer, Cham. https://doi.org/10.1007/978-3-319-38884-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-38884-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-38883-0

  • Online ISBN: 978-3-319-38884-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics