Skip to main content

RP-DQN: An Application of Q-Learning to Vehicle Routing Problems

  • Conference paper
  • First Online:
KI 2021: Advances in Artificial Intelligence (KI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12873))

Included in the following conference series:

Abstract

In this paper we present a new approach to tackle complex routing problems with an improved state representation that utilizes the model complexity better than previous methods. We enable this by training from temporal differences. Specifically Q-Learning is employed. We show that our approach achieves state-of-the-art performance for autoregressive policies that sequentially insert nodes to construct solutions on the Capacitated Vehicle Routing Problem (CVRP). Additionally, we are the first to tackle the Multiple Depot Vehicle Routing Problem (MDVRP) with Reinforcement Learning (RL) and demonstrate that this problem type greatly benefits from our approach over other Machine Learning (ML) methods.

A. Bdeir, S. Boeder, T. Dernedde and K. Tkachuk—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. CoRR (2016). http://arxiv.org/abs/1611.09940

  2. Chen, X., Tian, Y.: Learning to perform local rewriting for combinatorial optimization. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)

    Google Scholar 

  3. Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 2702–2711 (2016)

    Google Scholar 

  4. Delarue, A., Anderson, R., Tjandraatmadja, C.: Reinforcement learning with combinatorial actions: an application to vehicle routing. In: Advances in Neural Information Processing Systems, vol. 33, pp. 609–620. Curran Associates, Inc. (2020)

    Google Scholar 

  5. Falkner, J.K., Schmidt-Thieme, L.: Learning to solve vehicle routing problems with time windows through joint attention (2020). http://arxiv.org/abs/2006.09100

  6. Gurobi Optimization, LLC: Gurobi optimizer reference manual (2021)

    Google Scholar 

  7. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML (2018)

    Google Scholar 

  8. van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2094–2100. AAAI Press (2016)

    Google Scholar 

  9. Helsgaun, K.: An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems (2017). https://doi.org/10.13140/RG.2.2.25569.40807

  10. Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: AAAI (2018)

    Google Scholar 

  11. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, Lille, France, vol. 37, pp. 448–456. PMLR (2015)

    Google Scholar 

  12. Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem. CoRR (2019). http://arxiv.org/abs/1906.01227

  13. Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  14. Kool, W., van Hoof, H., Gromicho, J., Welling, M.: Deep policy dynamic programming for vehicle routing problems (2021). http://arxiv.org/abs/2102.11756

  15. Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! In: International Conference on Learning Representations (2019)

    Google Scholar 

  16. Lin, S., Kernighan, B.W.: An effective heuristic algorithm for the Traveling-Salesman problem. Oper. Res. 21(2), 498–516 (1973). https://doi.org/10.1287/opre.21.2.498

    Article  MathSciNet  MATH  Google Scholar 

  17. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236. ISSN 00280836

  18. Nazari, M.R., Oroojlooy, A., Snyder, L., Takac, M.: Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)

    Google Scholar 

  19. Perron, L., Furnon, V.: OR-Tools 7.2 (2019)

    Google Scholar 

  20. Rosenkrantz, D.J., Stearns, R.E., Lewis, P.M.: An analysis of several heuristics for the Traveling Salesman problem. In: Ravi, S.S., Shukla, S.K. (eds.) Fundamental Problems in Computing, pp. 45–69. Springer, Dordrecht (2009). https://doi.org/10.1007/978-1-4020-9688-4_3

    Chapter  Google Scholar 

  21. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay (2015). http://arxiv.org/abs/1511.05952

  22. Shaw, P.: Using constraint programming and local search methods to solve vehicle routing problems. In: Maher, M., Puget, J.-F. (eds.) CP 1998. LNCS, vol. 1520, pp. 417–431. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49481-2_30

    Chapter  Google Scholar 

  23. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press (2018)

    Google Scholar 

  24. Toth, P., Vigo, D.: Vehicle Routing: Problems, Methods, and Applications, 2nd edn. No. 18 in MOS-SIAM Series on Optimization, SIAM (2014). ISBN 9781611973587

    Google Scholar 

  25. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc. (2017)

    Google Scholar 

  26. Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Advances in Neural Information Processing Systems, vol. 28, Curran Associates, Inc. (2015)

    Google Scholar 

  27. Voudouris, C., Tsang, E.: Guided local search and its application to the Traveling Salesman problem. Eur. J. Oper. Res. 113(2), 469–499 (1999). https://doi.org/10.1016/S0377-2217(98)00099-X

    Article  MATH  Google Scholar 

  28. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)

    MATH  Google Scholar 

  29. Wu, Y., Song, W., Cao, Z., Zhang, J., Lim, A.: Learning improvement heuristics for solving routing problems (2020). http://arxiv.org/abs/1912.05784

  30. Zhao, J., Mao, M., Zhao, X., Zou, J.: A hybrid of deep reinforcement learning and local search for the vehicle routing problems. IEEE Trans. Intell. Trans. Syst. 1–11 (2020). https://doi.org/10.1109/TITS.2020.3003163

Download references

Acknowledgement

This work is co-funded via the research project L2O (https://www.ismll.uni-hildesheim.de/projekte/l2o_en.html) funded by the German Federal Ministry of Education and Research (BMBF) under the grant agreement no. 01IS20013A and the European Regional Development Fund project TrAmP (https://www.ismll.uni-hildesheim.de/projekte/tramp.html) under the grant agreement no. 85023841.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tim Dernedde .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bdeir, A., Boeder, S., Dernedde, T., Tkachuk, K., Falkner, J.K., Schmidt-Thieme, L. (2021). RP-DQN: An Application of Q-Learning to Vehicle Routing Problems. In: Edelkamp, S., Möller, R., Rueckert, E. (eds) KI 2021: Advances in Artificial Intelligence. KI 2021. Lecture Notes in Computer Science(), vol 12873. Springer, Cham. https://doi.org/10.1007/978-3-030-87626-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87626-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87625-8

  • Online ISBN: 978-3-030-87626-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics