Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method

Zhang, Li-dong; Wang, Ban; Liu, Zhi-xiang; Zhang, You-min; Ai, Jian-liang

doi:10.1631/FITEE.1800571

Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method

Published: 18 May 2019

Volume 20, pages 525–537, (2019)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Li-dong Zhang^1,2,
Ban Wang²,
Zhi-xiang Liu²,
You-min Zhang ORCID: orcid.org/0000-0002-9731-5943² &
…
Jian-liang Ai¹

188 Accesses
2 Citations
Explore all metrics

Abstract

Making rational decisions for sequential decision problems in complex environments has been challenging researchers in various fields for decades. Such problems consist of state transition dynamics, stochastic uncertainties, long-term utilities, and other factors that assemble high barriers including the curse of dimensionality. Recently, the state-of-the-art algorithms in reinforcement learning studies have been developed, providing a strong potential to efficiently break the barriers and make it possible to deal with complex and practical decision problems with decent performance. We propose a formulation of a velocity varying one-on-one quadrotor robot game problem in the three-dimensional space and an approximate dynamic programming approach using a projected policy iteration method for learning the utilities of game states and improving motion policies. In addition, a simulation-based iterative scheme is employed to overcome the curse of dimensionality. Simulation results demonstrate that the proposed decision strategy can generate effective and efficient motion policies that can contend with the opponent quadrotor and gather advantaged status during the game. Flight experiments, which are conducted in the Networked Autonomous Vehicles (NAV) Lab at the Concordia University, have further validated the performance of the proposed decision strategy in the real-time environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Path Planning and Trajectory Planning Algorithms: A General Overview

References

Ballard BW, 1983. The *-minimax search procedure for trees containing chance nodes. Artif Intell, 21(3):327–350. https://doi.org/10.1016/S0004-3702(83)80015-0
Article MATH Google Scholar
Bellman R, 1952. On the theory of dynamic programming. Proc Nat Acad Sci, 38(8):716–719. https://doi.org/10.1073/pnas.38.8.716
Article MathSciNet MATH Google Scholar
Bertsekas DP, 1971. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, Massachusetts, USA.
Google Scholar
Bertsekas DP, 2007. Dynamic Programming and Optimal Control (3^rd Ed.) Athena Scientific, Belmont, Massachusetts, USA.
MATH Google Scholar
Bertsekas DP, 2011. Temporal difference methods for general projected equations. IEEE Trans Autom Contr, 56(9):2128–2139. https://doi.org/10.1109/TAC.2011.2115290
Article MathSciNet MATH Google Scholar
Bertsekas DP, 2012. Incremental gradient, subgradient, and proximal methods for convex optimization: a survey In: Suvrit Sra SN, Wright SJ (Eds.), Optimization for Machine Learning. MIT Press, Massachusetts, USA.
Google Scholar
Bertsekas DP, 2015. Lambda-policy iteration: a review and a new implementation. https://doi.org/abs/1507.01029
Google Scholar
Bertsekas DP, Tsitsiklis JN, 2000. Gradient convergence in gradient methods with errors. SIAM J Optim, 10(3): 627–642. https://doi.org/10.1137/S1052623497331063
Article MathSciNet MATH Google Scholar
Buşoniu L, Ernst D, de Schutter B, et al., 2010. Online least-squares policy iteration for reinforcement learning control. Proc American Control Conf, p. 486–491. https://doi.org/10.1109/ACC.2010.5530856
Google Scholar
Efroni Y, Dalal G, Scherrer B, et al., 2018a. Beyond the one step greedy approach in reinforcement learning. https://doi.org/abs/1802.03654
Google Scholar
Efroni Y, Dalal G, Scherrer B, et al., 2018b. Multiple-step greedy policies in online and approximate reinforcement learning. https://doi.org/abs/1805.07956
Google Scholar
Fang J, Zhang LM, Fang W, et al., 2016. Approximate dynamic programming for CGF air combat maneuvering decision. 2^nd IEEE Int Conf on Computer and Communications, p. 1386–1390. https://doi.org/10.1109/CompComm.2016.7924931
Google Scholar
Ghamry KA, Dong YQ, Kamel MA, et al., 2016. Real-time autonomous take-off, tracking and landing of UAV on amovingUGV platform. 24^th Mediterranean Conf on Control and Automation, p. 1236–1241. https://doi.org/10.1109/MED.2016.7535886
Google Scholar
Hastie T, Tibshirani R, Friedman J, 2001. The Elements of Statistical Learning. Springer, New York, USA.
Book MATH Google Scholar
Hauk T, Buro M, Schaeffer J, 2004. Rediscovering *-minimax search. Int Conf on Computers and Games, p. 35–50. https://doi.org/10.1007/11674399_3
Google Scholar
Liu ZX, Zhang YM, Yu X, et al., 2016. Unmanned surface vehicles: an overview of developments and challenges. Ann Rev Contr, 41:71–93. https://doi.org/10.1016/j.arcontrol.2016.04.018
Article Google Scholar
Ma YF, Ma XL, Song X, 2014. A case study on air combat decision using approximated dynamic programming. Math Probl Eng, 2014:183401. https://doi.org/10.1155/2014/183401
Google Scholar
McGrew JS, 2008. Real-Time Maneuvering Decisions for Autonomous Air Combat. MS Thesis, Massachusetts Institute of Technology, Massachusetts, USA.
Google Scholar
McGrew JS, How JP, Williams B, et al., 2010. Air-combat strategy using approximate dynamic programming. J Guid Contr Dynam, 33(5):1641–1654. https://doi.org/10.2514/1.46815
Article Google Scholar
Powell WB, 2007. Approximate Dynamic Programming: Solving the Curses of Dimensionality. John Wiley & Sons, New Jersey, USA.
Book MATH Google Scholar
Russell SJ, Norvig P, 2010. Artificial Intelligence: a Modern Approach (3^rd Ed.). Prentice Hall, New Jersey, USA.
MATH Google Scholar
Sharifi F, Chamseddine A, Mahboubi H, et al., 2016. A distributed deployment strategy for a network of cooperative autonomous vehicles. IEEE Trans Contr Syst Technol, 23(2):737–745. https://doi.org/10.1109/TCST.2014.2341658
Article Google Scholar
Sutton RS, Barto AG, 1998. Reinforcement Learning: an Introduction. MIT Press, Massachusetts, USA.
MATH Google Scholar
Thiery C, Scherrer B, 2010. Least-squares λ policy iteration: bias-variance trade-off in control problems. Proc 27^th Int Conf on Machine Learning, p. 1071–1078.
Google Scholar
Wang B, Zhang YM, 2018. An adaptive fault-tolerant sliding mode control allocation scheme for multirotor helicopter subject to simultaneous actuator faults. IEEE Trans Ind Electron, 65(5):4227–4236. https://doi.org/10.1109/TIE.2017.2772153
Article Google Scholar
Wang B, Yu X, Mu LX, et al., 2019. Disturbance observer-based adaptive fault-tolerant control for a quadrotor helicopter subject to parametric uncertainties and external disturbances. Mech Syst Signal Process, 120:727–743. https://doi.org/10.1016/j.ymssp.2018.11.001
Article Google Scholar
Yu HZ, 2010. Convergence of least squares temporal difference methods under general conditions. 27^th Int Conf on Machine Learning, p. 1207–1214.
Google Scholar
Yu HZ, 2012. Least squares temporal difference methods: an analysis under general conditions. SIAM J Contr Optim, 50(6):3310–3343. https://doi.org/10.1137/100807879
Article MathSciNet MATH Google Scholar
Yuan C, Zhang YM, Liu ZX, 2015. A survey on technologies for automatic forest fire monitoring, detection, and fighting using unmanned aerial vehicles and remote sensing techniques. Can J Forest Res, 45(7):783–792. https://doi.org/10.1139/cjfr-2014-0347
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Aeronautics and Astronautics, Fudan University, Shanghai, 200433, China
Li-dong Zhang & Jian-liang Ai
Department of Mechanical, Industrial and Aerospace Engineering, Concordia University, Quebec, H3G1M8, Canada
Li-dong Zhang, Ban Wang, Zhi-xiang Liu & You-min Zhang

Authors

Li-dong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ban Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-xiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
You-min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jian-liang Ai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to You-min Zhang.

Additional information

Project supported by the National Natural Science Foundation of China (Nos. 61573282 and 61833013), the Scholarships from China Scholarship Council (No. 201606100139), and the Natural Sciences and Engineering Research Council of Canada

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Ld., Wang, B., Liu, Zx. et al. Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method. Frontiers Inf Technol Electronic Eng 20, 525–537 (2019). https://doi.org/10.1631/FITEE.1800571

Download citation

Received: 15 September 2018
Accepted: 27 November 2018
Published: 18 May 2019
Issue Date: April 2019
DOI: https://doi.org/10.1631/FITEE.1800571

Key words

CLC number

TP242

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Path Planning and Trajectory Planning Algorithms: A General Overview

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method

Abstract

Access this article

Similar content being viewed by others

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A practical guide to multi-objective reinforcement learning and planning

Path Planning and Trajectory Planning Algorithms: A General Overview

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation