Battery Management for Automated Warehouses via Deep Reinforcement Learning

Deng, Yanchen; An, Bo; Qiu, Zongmin; Li, Liuxi; Wang, Yong; Xu, Yinghui

doi:10.1007/978-3-030-64096-5_9

Yanchen Deng¹²,
Bo An¹²,
Zongmin Qiu¹³,
Liuxi Li¹³,
Yong Wang¹³ &
…
Yinghui Xu¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12547))

Included in the following conference series:

International Conference on Distributed Artificial Intelligence

612 Accesses
1 Citations

Abstract

Automated warehouses are widely deployed in large-scale distribution centers due to their ability of reducing operational cost and improving throughput capacity. In an automated warehouse, orders are fulfilled by battery-powered AGVs transporting movable shelves or boxes. Therefore, battery management is crucial to the productivity since recovering depleted batteries can be time-consuming and seriously affect the overall performance of the system by reducing the number of available robots. In this paper, we propose to solve the battery management problem by using deep reinforcement learning (DRL). We first formulate the battery management problem as a Markov Decision Process (MDP). Then we show the state-of-the-art DRL method which uses Gaussian noise to enforce exploration could perform poorly in the formulated MDP, and present a novel algorithm called TD3-ARL that performs effective exploration by regulating the magnitude of the outputted action. Finally, extensive empirical evaluations confirm the superiority of our algorithm over the state-of-the-art and the rule-based policies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chou, P.W., Maturana, D., Scherer, S.: Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In: ICML, pp. 834–843 (2017)
Google Scholar
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Chapter Google Scholar
Ebben, M.: Logistic control in automated transportation networks. Ph.D. thesis, University of Twente (2001)
Google Scholar
Enright, J.J., Wurman, P.R.: Optimization and coordinated autonomy in mobile fulfillment systems. In: Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
Google Scholar
François-Lavet, V., Henderson, P., Islam, R., Bellemare, M.G., Pineau, J.: An introduction to deep reinforcement learning. Found. Trends Mach. Learn. 11(3–4), 219–354 (2018)
Article Google Scholar
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: ICML, pp. 1582–1591 (2018)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML, pp. 1856–1865 (2018)
Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Le-Anh, T., De Koster, M.: A review of design and control of automated guided vehicle systems. Eur. J. Oper. Res. 171(1), 1–23 (2006)
Article MathSciNet Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR (2016)
Google Scholar
McHANEY, R.: Modelling battery constraints in discrete event automated guided vehicle simulations. Int. J. Prod. Res. 33(11), 3023–3040 (1995)
Article Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML, pp. 1928–1937 (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
OpenAI: Openai five (2018). https://blog.openai.com/openai-five/
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Tech. rep., Cambridge University (1994)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Article Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.A.: Deterministic policy gradient algorithms. In: ICML, pp. 387–395 (2014)
Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)
Article Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge, UK (May 1989)
Google Scholar
Zhao, M., Li, Z., An, B., Lu, H., Yang, Y., Chu, C.: Impression allocation for combating fraud in e-commerce via deep reinforcement learning with action norm penalty. In: IJCAI, pp. 3940–3946 (2018)
Google Scholar
Zou, B., Xu, X., De Koster, R., et al.: Evaluating battery charging and swapping strategies in a robotic mobile fulfillment system. Eur. J. Oper. Res. 267(2), 733–753 (2018)
Article Google Scholar

Download references

Acknowledgements

This work was supported by Alibaba Group through Alibaba Innovative Research (AIR) Program and Alibaba-NTU Joint Research Institute (JRI), Nanyang Technological University, Singapore.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Yanchen Deng & Bo An
Cainiao Smart Logistics Network, Hangzhou, China
Zongmin Qiu, Liuxi Li, Yong Wang & Yinghui Xu

Authors

Yanchen Deng
View author publications
You can also search for this author in PubMed Google Scholar
Bo An
View author publications
You can also search for this author in PubMed Google Scholar
Zongmin Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Liuxi Li
View author publications
You can also search for this author in PubMed Google Scholar
Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yinghui Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanchen Deng .

Editor information

Editors and Affiliations

University of Alberta, Edmonton, AB, Canada
Matthew E. Taylor
Nanjing University, Nanjing, China
Yang Yu
University of Oxford, Oxford, UK
Edith Elkind
Nanjing University, Nanjing, China
Yang Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deng, Y., An, B., Qiu, Z., Li, L., Wang, Y., Xu, Y. (2020). Battery Management for Automated Warehouses via Deep Reinforcement Learning. In: Taylor, M.E., Yu, Y., Elkind, E., Gao, Y. (eds) Distributed Artificial Intelligence. DAI 2020. Lecture Notes in Computer Science(), vol 12547. Springer, Cham. https://doi.org/10.1007/978-3-030-64096-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-64096-5_9
Published: 25 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64095-8
Online ISBN: 978-3-030-64096-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics