Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems

Ferreira, Leonardo Anjoletto; Costa Ribeiro, Carlos Henrique; da Costa Bianchi, Reinaldo Augusto

doi:10.1007/s10489-014-0534-0

Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems

Published: 01 May 2014

Volume 41, pages 551–562, (2014)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Leonardo Anjoletto Ferreira^1,3,
Carlos Henrique Costa Ribeiro² &
Reinaldo Augusto da Costa Bianchi³

540 Accesses
7 Citations
Explore all metrics

Abstract

This article presents two new algorithms for finding the optimal solution of a Multi-agent Multi-objective Reinforcement Learning problem. Both algorithms make use of the concepts of modularization and acceleration by a heuristic function applied in standard Reinforcement Learning algorithms to simplify and speed up the learning process of an agent that learns in a multi-agent multi-objective environment. In order to verify performance of the proposed algorithms, we considered a predator-prey environment in which the learning agent plays the role of prey that must escape the pursuing predator while reaching for food in a fixed location. The results show that combining modularization and acceleration using a heuristics function indeed produced simplification and speeding up of the learning process in a complex problem when comparing with algorithms that do not make use of acceleration or modularization techniques, such as Q-Learning and Minimax-Q.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating the Computation of Solutions in Resource Allocation Problems Using an Evolutionary Approach and Multiagent Reinforcement Learning

Multiagent Reinforcement Learning for Combinatorial Optimization

Multi-agent-Based Systems in Machine Learning and Its Practical Case Studies

Notes

¹ The modules used in this problem were: navigate the room, do not hit a wall, pass through a door, find the base to recharge and recharge the battery.

References

Bianchi RAC, Ribeiro CHC, Costa AHR (2007) Heuristic selection of actions in multiagent reinforcement learning.In: International joint conference on artificial intellifence, vol 20. Morgan Kaufmann, Hyderabad, India, pp 6–12
Google Scholar
Bianchi RAC, Ribeiro CHC, Costa AHR (2008) Accelerating autonomous learning by using heuristic selection of actions. J Heuristics 14(2):135–168
Article Google Scholar
Humphrys M (1997) Action selection methods using reinforcement learning. Ph.D. thesis, University of Cambridge
Leyton-Brown K, Shoham Y (2008) Essentials of game theory: a concise multidisciplinary introduction, vol 2. Morgan & Claypool Publishers
Lin LJ (1993) Hierarchical learning of robot skills by reinforcement.In: International conference on neural networks 1993. IEEE, Nagoya, pp 181–186
Google Scholar
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning.In: Proceedings of the 11th international conference on machine learning. Morgan Kaufmann, New Brunswick, pp 157–163
Google Scholar
Littman ML (2001) Friend-or-foe q-learning in general-sum games.In: Proceedings of the 8th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 322–328
Google Scholar
Mausam Kolobov A (2012) Planning with Markov decision processes: an AI perspective, vol 6. Morgan & Claypool Publishers
Russell SJ, Norvig P (2004) Artificial intelligence, 2nd edn. Pearson Education India, NJ
MATH Google Scholar
Singh SP (1992) Transfer of learning by composing solutions of elemental sequential tasks. Mach Learn 8(3):323–339
MATH Google Scholar
Sousa CdO (2007) Aprendizagem por Reforço de Sistemas com Múltiplos Objectivos. Master’s thesis, Universidade Técnica De Lisboa
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning. The MIT press, Cambridge, MA
Google Scholar
Sutton RS, Modayil J, Delp M, Degris T, Pilarski PM, White A, Precup D (2011) Horde: a scalable real-time architeture for learning knowledge from unsupervised sensorimotor interaction.In: Proceedings of 10th international conference on autonomous agents and multiagent system (AAMAS 2011), pp 761–768
Tham CK, Prager RW (1994) A modular Q-learning architecture for manipulator task decomposition.In: Proceedings of the 11th international conference on machine learning. Citeseer, Morgan Kaufmann, New Brunswick, NJ, pp 309–317
Google Scholar
Vamplew P, Yearwood J, Dazeley R, Berry A (2008) On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts. In: Wobcke W, Zhang M (eds)AI 2008: advances in artificial intelligence. Lecture notes in computer science, vol 5360. Springer Berlin/Heidelberg, Auckland, Nova Zelândia, pp 372–378
Google Scholar
Watkins CJCH (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292
MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the National Laboratory for Scientific Computing (LNCC) for providing equipment that allowed the realization of the experiments. Leonardo Anjoletto Ferreira acknowledges support from CNPq (grant 151521/2010-7) and CAPES. Carlos H. C. Ribeiro thanks CNPq (grant 305772/2010-4).

Author information

Authors and Affiliations

Universidade Metodista de São Paulo, Rua Alfeu Tavares, 149, São Bernardo do Campo, São Paulo, Brazil
Leonardo Anjoletto Ferreira
Instituto Tecnológico de Aeronáutica, Praça Marechal Eduardo Gomes, 50, São José dos Campos, São Paulo, Brazil
Carlos Henrique Costa Ribeiro
Centro Universitário da FEI, Av. Humberto de Alencar Castelo Branco, 3972, São Bernardo do Campo, São Paulo, Brazil
Leonardo Anjoletto Ferreira & Reinaldo Augusto da Costa Bianchi

Authors

Leonardo Anjoletto Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Henrique Costa Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Reinaldo Augusto da Costa Bianchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonardo Anjoletto Ferreira.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferreira, L.A., Costa Ribeiro, C.H. & da Costa Bianchi, R.A. Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems. Appl Intell 41, 551–562 (2014). https://doi.org/10.1007/s10489-014-0534-0

Download citation

Published: 01 May 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10489-014-0534-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems

Abstract

Access this article

Similar content being viewed by others

Accelerating the Computation of Solutions in Resource Allocation Problems Using an Evolutionary Approach and Multiagent Reinforcement Learning

Multiagent Reinforcement Learning for Combinatorial Optimization

Multi-agent-Based Systems in Machine Learning and Its Practical Case Studies

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems

Abstract

Access this article

Similar content being viewed by others

Accelerating the Computation of Solutions in Resource Allocation Problems Using an Evolutionary Approach and Multiagent Reinforcement Learning

Multiagent Reinforcement Learning for Combinatorial Optimization

Multi-agent-Based Systems in Machine Learning and Its Practical Case Studies

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation