Abstract
Task-space control needs the inverse kinematics solution or Jacobian matrix for the transformation from task space to joint space. However, they are not always available for redundant robots because there are more joint degrees-of-freedom than Cartesian degrees-of-freedom. Intelligent learning methods, such as neural networks (NN) and reinforcement learning (RL) can learn the inverse kinematics solution. However, NN needs big data and classical RL is not suitable for multi-link robots controlled in task space. In this paper, we propose a fully cooperative multi-agent reinforcement learning (MARL) to solve the kinematic problem of redundant robots. Each joint of the robot is regarded as one agent. The fully cooperative MARL uses a kinematic learning to avoid function approximators and large learning space. The convergence property of the proposed MARL is analyzed. The experimental results show that our MARL is much more better compared with the classic methods such as Jacobian-based methods and neural networks.
Similar content being viewed by others
Notes
Task-space (or Cartesian space) is defined by the position and orientation of the end effector of a robot. Joint-space is defined by angular displacements of each joint angle of a robot.
References
Ahmadi S, Fateh M (2018) Task-space asymptotic tracking control of robots using a direct adaptive Taylor series controller. J Vib Control 24(23):5570–5584. https://doi.org/10.1177/1077546318758800
Ansari Y, Falotico E (2016) A multiagent reinforcement learning approach for inverse kinematics oh high dimensional manipulators with precision positioning. In: 6th IEEE RAS/EMBS international conference on biomedical robotics and biomechatronics (BioRob). https://doi.org/10.1109/BIOROB.2016.7523669
Atashzar S, Tavakoli M, Patel R (2018) A computational-model-based study of supervised haptics-enabled therapist-in-the-loop training for upper-limb poststroke robotic rehabilitation. IEEE/ASME Trans Mechatron 23(2):562–574. https://doi.org/10.1109/TMECH.2018.2806918
Axinte D, Dong X, Palmer D, Rushworth A, Guzman S, Olarra A (2018) Miror-miniaturized robotic systems for holisticin-siturepair and maintenance works in restrained and hazardous environments. IEEE/ASME Trans Mechatron 23(2):978–981. https://doi.org/10.1109/TMECH.2018.2800285
Bcsi B, Nguyen-Tuong D, Csat L, Schlkopf B, Peters J (2011) Learning inverse kinematics with structured prediction. IEEE/RSJ Int Conf Intell Robots Syst. https://doi.org/10.1109/IROS.2011.6094666
Bitzer S, Howard M, Vijayakumar S (2010) Using dimensionality reduction to exploit constraints in reinforcement learning. IEEE/RSJ Int Conf Intell Robots Syst (IROS). https://doi.org/10.1109/IROS.2010.5650243
Buşoniu L, Babûska R, De Schutter B (2010) Multi-agent reinforcement learning: an overview. In: Srinivasan D, Jain L (eds) Innovations in multi-agent systems and applications—1. Studies in computational intelligence. Lecture notes in computer science, vol 310. Springer, Berlin. https://doi.org/10.1007/978-3-642-14435-6_7
Buşoniu L, Babûska R, De Schutter B, Ernst D (2010) Reinforcement learning and dynamic programming using function approximators. Automation and Control Engineering Series. CRC Press, Boca Raton
Cheah C, Li X (2011) Singularity-robust task-space tracking control of robot. IEEE Int Conf Robot Autom. https://doi.org/10.1109/ICRA.2011.5979932
Csistzar A, Eilers J, Verl A (2017) On solving the inverse kinematics problem using neural networks. In: 24th international conference on mechatronics and machine vision in practice. https://doi.org/10.1109/M2VIP.2017.8211457
Deisenroth M, Rasmussen C (2011) PILCO: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th international conference on machine learning, Bellevue, WAA, USA
Deisenroth MP, Neumann G, Peters J (2011) A survey on policy search for robotics. Found Trends Robot 2(1–2):1–142. https://doi.org/10.1561/2300000021
Duka A (2014) Neural network based inverse kinematics solution for trajectory tracking of a robotic arm. In: Procedia technology, the 7th international conference interdisciplinarity in engineering, INTER-ENG 2013. Petru Maior University of Tirgu Mures, Romania. https://doi.org/10.1016/j.protcy.2013.12.451
Feng Y, Yao-nan W, Yi-min Y (2012) Inverse kinematics solution for robot manipulator based on neural network under joint subspace. Int J Comput Commun Control 7(3):459–472. https://doi.org/10.15837/ijccc.2012.3.1387
Galicki M (2016) Finite-time trajectory tracking control in task space of robotic manipulators. Automatica 67:165–170. https://doi.org/10.1016/j.automatica.2016.01.025
Galicki M (2016) Robust task space trajectory tracking control of robotic manipulators. Int J Appl Mech Eng 21(3):547–568. https://doi.org/10.1515/ijame-2016-0033
Grondman I, Buşoniu L, Babûska R (2012) Model learning actor-critic algorithms: performance evaluation in a motion control task. In: 51st IEEE conference on decision and control (CDC), pp 5272–5277. https://doi.org/10.1109/CDC.2012.6426427
Grondman I, Vaandrager M, Buşoniu L, Babûska R, Schuitema E (2011) Actor-critic control with reference model learning. In: Proceedings of the 18th World congress the international federation of automatic control, pp 14723–14728. https://doi.org/10.3182/20110828-6-IT-1002.00759
Grondman I, Vaandrager M, Buşoniu L, Babûska R, Schuitema E (2012a) Efficient model learning methods for actor-critic control. IEEE Trans Syst Man Cybern B Cybern 42(3):291–602. https://doi.org/10.1109/TSMCB.2011.2170565
Hyatt P (2019) Configuration estimation for accurate position control of large-scale soft robots. IEEE/ASME Trans Mechatron 24(1):88–99. https://doi.org/10.1109/TMECH.2018.2878228
Jaakola TMJ, Singh S (1994) On the convergence of stochastic iterative dyanamic programming algorithms. Neural Comput 6(6):1185–1201. https://doi.org/10.1162/neco.1994.6.6.1185
Kober J, Bagnell J, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274. https://doi.org/10.1007/978-3-319-03194-1_2
Lewis F, Vrable D, Vamvoudakis K (2012) Reinforcement learning and feedback control: using natural decision methods to desgin optimal adaptive controllers. IEEE Control Syst Mag 32(6):76–105. https://doi.org/10.1109/MCS.2012.2214134
Luya L, Gruver W, Zhang Q, Yang Z (2001) Kinematic control of redundant robots and the motion optimizability measure. IEEE Trans Syst Man Cybern Part B Cybern 31(1):155–160. https://doi.org/10.1109/3477.907575
Moon Y, Seo J, Choi J (2015) Development of new end-effector for proof-of-concept of fully robotic multichannel biopsy. IEEE/ASME Trans Mechatron 20(6):2996–3008. https://doi.org/10.1109/TMECH.2015.2418793
Patel R, Shadpey F (2005) Control of redundant manipulators: theory and experiments. Springer, Berlin. https://doi.org/10.1007/b93979
Perrusquía A, Yu W (2020) Human-in-the-loop control using euler angles. J Intell Robot Syst 97:271–285. https://doi.org/10.1007/s10846-019-01058-2
Perrusquía A, Yu W (2020) Robot position/force control in unknown environment using hybrid reinforcement learning. Cybern Syst. https://doi.org/10.1080/01969722.2020.1758466
Perrusquía A, Yu W, Soria A (2019) Large space dimension reinforcement learning for robot position/force discrete control. In: 2019 6th international conference on control, decision and information technologies (CoDIT 2019), Paris, France. https://doi.org/10.1109/CoDIT.2019.8820575
Perrusquía A, Yu W, Soria A (2019) Optimal contact force in unknown environments using reinforcement learning and model-free controllers. In: 16th international conference on electrical engineering, computing science and automatic control (CCE), Mexico city, Mexico. https://doi.org/10.1109/ICEEE.2019.8884518
Perrusquía A, Yu W, Soria A (2019) Position/force control of robots manipulators using reinforcement learning. Ind Robot Int J Robot Res Appl 46(2):267–280. https://doi.org/10.1108/IR-10-2018-0209
Perrusquiía A, Yu W (2020) Robust control under worst-case uncertainty for unknown nonlinear systems using modified reinforcement learning. Int J Robust Nonlinear Control 30(7):2920–2936. https://doi.org/10.1002/rnc.4911
Rolf M, Steil J (2014) Efficient exploratory learning of inverse kinematics on a bionic elephant trunk. IEEE Trans Neural Netw Learn Syst 25(6):1147–1160. https://doi.org/10.1109/TNNLS.2013.2287890
Schulman J, Wolski F, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Silver D, Lever G, Hess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st international conference on machine learning, Beijing, China, vol 32, pp 387–395
Sun K, Liu L, Qiu J, Feng G (2020) Fuzzy adaptive finite-time fault tolerant control for strict-feedback nonlinear systems. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2020.2965890
Sun K, Qiu J, Karimi H, Fu Y (2020) Event- triggered robust fuzzy adaptive finite-time control of nonlinear systems with prescribed performance. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2020.2979129
Sutton RAB (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Tamei T, Shibata T (2009) Policy gradient learning of cooperative interaction with a robot using user’s biological signals. Int Conf Neural Inf Process (ICONIP). https://doi.org/10.1007/978-3-642-03040-6_125
Theodorou E, Buchli J, Schaal S (2010) Reinforcement learning of motor skills in high dimensions: a path integral approach. IEEE Int Conf Robot Autom (ICRA). https://doi.org/10.1109/ROBOT.2010.5509336
Tuong D, Peters J (2011) Learning task-space tracking control with kernels. IEEE/RSJ Int Conf Intell Robots Syst. https://doi.org/10.1109/IROS.2011.6094428
Wiering MA, van Hasselt H (2007) Two novel on-policy reinforcement learning algorithms based on TD(\(\lambda\))-method. In: Proceedings of the 2007 IEEE symposium on approximate dynamic programming and reinforcement learning (ADPRL). https://doi.org/10.1109/ADPRL.2007.368200
Wiering MA, van Hasselt H (2009) The QV family compared to other reinforcement learning algorithms. In: 2009 IEEE symposium on adaptive dynamic programming and reinforcement learning. https://doi.org/10.1109/ADPRL.2009.4927532
Xian B, de Queiroz M, Dawson D, Walker I (2004) Task-space tracking control of robots manipulators via quaternion feedback. IEEE Trans Robot Autom 20(1):160–167. https://doi.org/10.1109/TRA.2003.820932
Yu W, Perrusquía A (2019) Simplified stable admittance control using end-effector orientations. Int J Soc Robot. https://doi.org/10.1007/s12369-019-00579-y
Zhang D, Wei B (2017) On the development of learning control for robotic manipulators. Robotics. https://doi.org/10.3390/robotics6040023
Zheng Y, Ma J, Wang L (2017) Consensus of hybrid multi-agent systems. IEEE Trans Neural Netw Learn Syst 29(4):1359–1365. https://doi.org/10.1109/TNNLS.2017.2651402
Zhu Y, Li S, Ma J, Zheng Y (2018) Bipartite consensus in networks of agents with antagonistic interactions and quantization. IEEE Trans Circuits Syst II Express Briefs 65(12):2012–2016. https://doi.org/10.1109/TCSII.2018.2811803
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Perrusquía, A., Yu, W. & Li, X. Multi-agent reinforcement learning for redundant robot control in task-space. Int. J. Mach. Learn. & Cyber. 12, 231–241 (2021). https://doi.org/10.1007/s13042-020-01167-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01167-7