Abstract
This article presents two new algorithms for finding the optimal solution of a Multi-agent Multi-objective Reinforcement Learning problem. Both algorithms make use of the concepts of modularization and acceleration by a heuristic function applied in standard Reinforcement Learning algorithms to simplify and speed up the learning process of an agent that learns in a multi-agent multi-objective environment. In order to verify performance of the proposed algorithms, we considered a predator-prey environment in which the learning agent plays the role of prey that must escape the pursuing predator while reaching for food in a fixed location. The results show that combining modularization and acceleration using a heuristics function indeed produced simplification and speeding up of the learning process in a complex problem when comparing with algorithms that do not make use of acceleration or modularization techniques, such as Q-Learning and Minimax-Q.
Similar content being viewed by others
Notes
1 The modules used in this problem were: navigate the room, do not hit a wall, pass through a door, find the base to recharge and recharge the battery.
References
Bianchi RAC, Ribeiro CHC, Costa AHR (2007) Heuristic selection of actions in multiagent reinforcement learning.In: International joint conference on artificial intellifence, vol 20. Morgan Kaufmann, Hyderabad, India, pp 6–12
Bianchi RAC, Ribeiro CHC, Costa AHR (2008) Accelerating autonomous learning by using heuristic selection of actions. J Heuristics 14(2):135–168
Humphrys M (1997) Action selection methods using reinforcement learning. Ph.D. thesis, University of Cambridge
Leyton-Brown K, Shoham Y (2008) Essentials of game theory: a concise multidisciplinary introduction, vol 2. Morgan & Claypool Publishers
Lin LJ (1993) Hierarchical learning of robot skills by reinforcement.In: International conference on neural networks 1993. IEEE, Nagoya, pp 181–186
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning.In: Proceedings of the 11th international conference on machine learning. Morgan Kaufmann, New Brunswick, pp 157–163
Littman ML (2001) Friend-or-foe q-learning in general-sum games.In: Proceedings of the 8th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 322–328
Mausam Kolobov A (2012) Planning with Markov decision processes: an AI perspective, vol 6. Morgan & Claypool Publishers
Russell SJ, Norvig P (2004) Artificial intelligence, 2nd edn. Pearson Education India, NJ
Singh SP (1992) Transfer of learning by composing solutions of elemental sequential tasks. Mach Learn 8(3):323–339
Sousa CdO (2007) Aprendizagem por Reforço de Sistemas com Múltiplos Objectivos. Master’s thesis, Universidade Técnica De Lisboa
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44
Sutton RS, Barto AG (1998) Reinforcement learning. The MIT press, Cambridge, MA
Sutton RS, Modayil J, Delp M, Degris T, Pilarski PM, White A, Precup D (2011) Horde: a scalable real-time architeture for learning knowledge from unsupervised sensorimotor interaction.In: Proceedings of 10th international conference on autonomous agents and multiagent system (AAMAS 2011), pp 761–768
Tham CK, Prager RW (1994) A modular Q-learning architecture for manipulator task decomposition.In: Proceedings of the 11th international conference on machine learning. Citeseer, Morgan Kaufmann, New Brunswick, NJ, pp 309–317
Vamplew P, Yearwood J, Dazeley R, Berry A (2008) On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts. In: Wobcke W, Zhang M (eds)AI 2008: advances in artificial intelligence. Lecture notes in computer science, vol 5360. Springer Berlin/Heidelberg, Auckland, Nova Zelândia, pp 372–378
Watkins CJCH (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292
Acknowledgements
The authors would like to thank the National Laboratory for Scientific Computing (LNCC) for providing equipment that allowed the realization of the experiments. Leonardo Anjoletto Ferreira acknowledges support from CNPq (grant 151521/2010-7) and CAPES. Carlos H. C. Ribeiro thanks CNPq (grant 305772/2010-4).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ferreira, L.A., Costa Ribeiro, C.H. & da Costa Bianchi, R.A. Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems. Appl Intell 41, 551–562 (2014). https://doi.org/10.1007/s10489-014-0534-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0534-0