RP-DQN: An Application of Q-Learning to Vehicle Routing Problems

Bdeir, Ahmad; Boeder, Simon; Dernedde, Tim; Tkachuk, Kirill; Falkner, Jonas K.; Schmidt-Thieme, Lars

doi:10.1007/978-3-030-87626-5_1

Ahmad Bdeir¹¹,
Simon Boeder¹¹,
Tim Dernedde¹¹,
Kirill Tkachuk¹¹,
Jonas K. Falkner¹¹ &
…
Lars Schmidt-Thieme¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12873))

Included in the following conference series:

German Conference on Artificial Intelligence (Künstliche Intelligenz)

1319 Accesses
4 Citations

Abstract

In this paper we present a new approach to tackle complex routing problems with an improved state representation that utilizes the model complexity better than previous methods. We enable this by training from temporal differences. Specifically Q-Learning is employed. We show that our approach achieves state-of-the-art performance for autoregressive policies that sequentially insert nodes to construct solutions on the Capacitated Vehicle Routing Problem (CVRP). Additionally, we are the first to tackle the Multiple Depot Vehicle Routing Problem (MDVRP) with Reinforcement Learning (RL) and demonstrate that this problem type greatly benefits from our approach over other Machine Learning (ML) methods.

A. Bdeir, S. Boeder, T. Dernedde and K. Tkachuk—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. CoRR (2016). http://arxiv.org/abs/1611.09940
Chen, X., Tian, Y.: Learning to perform local rewriting for combinatorial optimization. In: Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc. (2019)
Google Scholar
Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 2702–2711 (2016)
Google Scholar
Delarue, A., Anderson, R., Tjandraatmadja, C.: Reinforcement learning with combinatorial actions: an application to vehicle routing. In: Advances in Neural Information Processing Systems, vol. 33, pp. 609–620. Curran Associates, Inc. (2020)
Google Scholar
Falkner, J.K., Schmidt-Thieme, L.: Learning to solve vehicle routing problems with time windows through joint attention (2020). http://arxiv.org/abs/2006.09100
Gurobi Optimization, LLC: Gurobi optimizer reference manual (2021)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML (2018)
Google Scholar
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2094–2100. AAAI Press (2016)
Google Scholar
Helsgaun, K.: An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems (2017). https://doi.org/10.13140/RG.2.2.25569.40807
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: AAAI (2018)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, Lille, France, vol. 37, pp. 448–456. PMLR (2015)
Google Scholar
Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem. CoRR (2019). http://arxiv.org/abs/1906.01227
Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Kool, W., van Hoof, H., Gromicho, J., Welling, M.: Deep policy dynamic programming for vehicle routing problems (2021). http://arxiv.org/abs/2102.11756
Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! In: International Conference on Learning Representations (2019)
Google Scholar
Lin, S., Kernighan, B.W.: An effective heuristic algorithm for the Traveling-Salesman problem. Oper. Res. 21(2), 498–516 (1973). https://doi.org/10.1287/opre.21.2.498
Article MathSciNet MATH Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236. ISSN 00280836
Nazari, M.R., Oroojlooy, A., Snyder, L., Takac, M.: Reinforcement learning for solving the vehicle routing problem. In: Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)
Google Scholar
Perron, L., Furnon, V.: OR-Tools 7.2 (2019)
Google Scholar
Rosenkrantz, D.J., Stearns, R.E., Lewis, P.M.: An analysis of several heuristics for the Traveling Salesman problem. In: Ravi, S.S., Shukla, S.K. (eds.) Fundamental Problems in Computing, pp. 45–69. Springer, Dordrecht (2009). https://doi.org/10.1007/978-1-4020-9688-4_3
Chapter Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay (2015). http://arxiv.org/abs/1511.05952
Shaw, P.: Using constraint programming and local search methods to solve vehicle routing problems. In: Maher, M., Puget, J.-F. (eds.) CP 1998. LNCS, vol. 1520, pp. 417–431. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49481-2_30
Chapter Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press (2018)
Google Scholar
Toth, P., Vigo, D.: Vehicle Routing: Problems, Methods, and Applications, 2nd edn. No. 18 in MOS-SIAM Series on Optimization, SIAM (2014). ISBN 9781611973587
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, Curran Associates, Inc. (2017)
Google Scholar
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Advances in Neural Information Processing Systems, vol. 28, Curran Associates, Inc. (2015)
Google Scholar
Voudouris, C., Tsang, E.: Guided local search and its application to the Traveling Salesman problem. Eur. J. Oper. Res. 113(2), 469–499 (1999). https://doi.org/10.1016/S0377-2217(98)00099-X
Article MATH Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
MATH Google Scholar
Wu, Y., Song, W., Cao, Z., Zhang, J., Lim, A.: Learning improvement heuristics for solving routing problems (2020). http://arxiv.org/abs/1912.05784
Zhao, J., Mao, M., Zhao, X., Zou, J.: A hybrid of deep reinforcement learning and local search for the vehicle routing problems. IEEE Trans. Intell. Trans. Syst. 1–11 (2020). https://doi.org/10.1109/TITS.2020.3003163

Download references

Acknowledgement

This work is co-funded via the research project L2O (https://www.ismll.uni-hildesheim.de/projekte/l2o_en.html) funded by the German Federal Ministry of Education and Research (BMBF) under the grant agreement no. 01IS20013A and the European Regional Development Fund project TrAmP (https://www.ismll.uni-hildesheim.de/projekte/tramp.html) under the grant agreement no. 85023841.

Author information

Authors and Affiliations

University of Hildesheim, 31141, Hildesheim, Germany
Ahmad Bdeir, Simon Boeder, Tim Dernedde, Kirill Tkachuk, Jonas K. Falkner & Lars Schmidt-Thieme

Authors

Ahmad Bdeir
View author publications
You can also search for this author in PubMed Google Scholar
Simon Boeder
View author publications
You can also search for this author in PubMed Google Scholar
Tim Dernedde
View author publications
You can also search for this author in PubMed Google Scholar
Kirill Tkachuk
View author publications
You can also search for this author in PubMed Google Scholar
Jonas K. Falkner
View author publications
You can also search for this author in PubMed Google Scholar
Lars Schmidt-Thieme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tim Dernedde .

Editor information

Editors and Affiliations

Czech Technical University in Prague, Prague, Czech Republic
Stefan Edelkamp
University of Lübeck, Lübeck, Germany
Ralf Möller
University of Leoben, Leoben, Austria
Elmar Rueckert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bdeir, A., Boeder, S., Dernedde, T., Tkachuk, K., Falkner, J.K., Schmidt-Thieme, L. (2021). RP-DQN: An Application of Q-Learning to Vehicle Routing Problems. In: Edelkamp, S., Möller, R., Rueckert, E. (eds) KI 2021: Advances in Artificial Intelligence. KI 2021. Lecture Notes in Computer Science(), vol 12873. Springer, Cham. https://doi.org/10.1007/978-3-030-87626-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-87626-5_1
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87625-8
Online ISBN: 978-3-030-87626-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics