Skip to main content

Merging Individually Learned Optimal Results to Accelerate Coordination

  • Conference paper
Advances in Web-Age Information Management (WAIM 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3129))

Included in the following conference series:

  • 877 Accesses

Abstract

By merging agents’ individually learned optimal value functions, agents can learn their optimal policies in a multiagent system. Pre-knowledge of the task is used to decompose it into several subtasks and this decomposition greatly reduces the state and action spaces. The optimal value functions of each subtask are learned by MAXQ-Q[1] algorithm. By defining the lower and upper bound on the value functions of the whole task, we propose a novel online multiagent learning algorithm LU-Q, and LU-Q accelerates learning of coordination between multiple agents by task decomposition and action pruning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dietterich, T.G.: Hierarchical Reinforcement Learning With the MAXQ Value Function Decomposition. J. of Artificial Intelligence Research 13, 227–303 (2000)

    MATH  MathSciNet  Google Scholar 

  2. Watkins, C.J.C.H.: Learning from Delayed Rewards. Cambridge University. Ph.D. thesis, Cambridge, UK (1989)

    Google Scholar 

  3. Singh, S., Cohn, D.: How to Dynamically Merge Markov Decision Processes. In: 17th Int. Conference on Neural Information Processing Systems (1999)

    Google Scholar 

  4. Ghavamzadeh, M., Mahadevan, S.: A Multiagent Reinforcement Learning Algorithm by Dynamically Merging Markov Decision Processes. In: 1st Int. Joint Conference on Autonomous Agents and Multiagent Systems, Bologna (2002)

    Google Scholar 

  5. Boutilier, C.: Sequential Optimality and Coordination in Multiagent Systems. In: 16th Int. Joint Conference on Artificial Intelligence, Stockholm, pp. 478–485 (1999)

    Google Scholar 

  6. Littman, M.L.: Markov Games as a Framework for Multi-Agent Reinforcement Learning. In: 11th Int. Conference of Machine Learning, New Brunswick, pp. 157–163 (1994)

    Google Scholar 

  7. Hu, J., Wellman, M.P.: Nash Q-Learning for General-Sum Stochastic Games. J. of Machine Learning Research 1, 1–30 (2003)

    Google Scholar 

  8. Greenwald, A., Hall, K., Serrano, R.: Correlated-Q Learning. In: 20th Int. Conference on Neural Information Processing Systems, Workshop on Multiagent Learning (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, H., Huang, S. (2004). Merging Individually Learned Optimal Results to Accelerate Coordination. In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management. WAIM 2004. Lecture Notes in Computer Science, vol 3129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27772-9_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27772-9_64

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22418-1

  • Online ISBN: 978-3-540-27772-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics