Pricing in Agent Economies Using Neural Networks and Multi-agent Q-Learning

Tesauro, Gerald

doi:10.1007/3-540-44565-X_13

Gerald Tesauro³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1828))

1066 Accesses
9 Citations

Abstract

Reinforcement Learning (RL) procedures have been established as powerful and practical methods for solving Markov Decision Problems. One of the most signicant and actively investigated RL algorithms is Q-learning (Watkins, 1989). Q-learning is an algorithm for learning to estimate the long-term expected reward for a given state-action pair. It has the nice property that it does not need a model of the environment, and it can be used for on-line learning. A number of powerful convergence proofs have been given showing that Q-learning is guaranteed to converge, in cases where the state space is small enough so that lookup table representations can be used (Watkins and Dayan, 1992). Furthermore, in large state spaces where lookup table representations are infeasible, RL methods can be combined with function approximators to give good practical performance despite the lack of theoretical guarantees of convergence to optimal policies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. H. Crites and A. G. Barto, (1996). “Improving elevator performance using reinforcement learning.” In: D. Touretzky et al., eds., Advances in Neural Information Processing Systems 8, 1017–1023, MIT Press.
Google Scholar
A. Greenwald and J. O. Kephart, (1999). “Shopbots and pricebots.” Proceedings of IJCAI-99, 506–511.
Google Scholar
J. Hu and M. P. Wellman, (1996). “Self-fulfilling bias in multiagent learning.” Proceedings of ICMAS-96, AAAI Press.
Google Scholar
J. Hu and M. P. Wellman, (1998). “Multiagent reinforcement learning: theoretical framework and an algorithm.” Proceedings of ICML-98, Morgan Kaufmann.
Google Scholar
J. O. Kephart, J. E. Hanson and J. Sairamesh, (1998). “Price-war dynamics in a free-market economy of software agents.” In: Proceedings of ALIFE-VI, Los Angeles.
Google Scholar
D. Kreps, (1990). A Course in Microeconomic Theory. Princeton NJ: Princeton University Press.
Google Scholar
M. L. Littman, (1994). “Markov games as a framework for multi-agent reinforcement learning,” Proceedings of the Eleventh International Conference on Machine Learning, 157–163, Morgan Kaufmann.
Google Scholar
J. Sairamesh and J. O. Kephart, (1998). “Dynamics of price and quality differentiation in information and computational markets.” Proceedings of the First International Conference on Information and Computation Economics (ICE-98), 28–36, ACM Press.
Google Scholar
T. W. Sandholm and R. H. Crites, (1995). “On multiagent Q-Learning in a semi-competitive domain.” 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Workshop on Adaptation and Learning in Multiagent Systems, Montreal, Canada, 71–77.
Google Scholar
P. Stone and M. Veloso, (1999). “Team-partitioned, opaque-transition reinforcement learning.” Proceedings of the Third International Conference on Autonomous Agents, 206–212. New York: ACM Press.
Chapter Google Scholar
G. Tesauro, (1995). “Temporal difference learning and TD-Gammon.” Comm. of the ACM, 38:3, 58–67.
Article Google Scholar
G. J. Tesauro and J. O. Kephart, (2000). “Foresight-based pricing algorithms in agent economies.” Decision Support Sciences, to appear.
Google Scholar
J. M. Vidal and E. H. Durfee, (1998). “Learning nested agent models in an information economy,” J. of Experimental and Theoretical AI, 10(3), 291–308.
Article MATH Google Scholar
C. J. C. H. Watkins, (1989). “Learning from delayed rewards.” Doctoral dissertation, Cambridge University.
Google Scholar
C. J. C. H. Watkins and P. Dayan, (1992). “Q-learning.” Machine Learning 8, 279–292.
MATH Google Scholar
W. Zhang and T. G. Dietterich, (1996). “High-performance job-shop scheduling with a time-delay TD(λ) network.” In: D. Touretzky et al., eds., Advances in Neural Information Processing Systems 8, 1024–1030, MIT Press.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, USA
Gerald Tesauro

Authors

Gerald Tesauro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CECS Department, University of Missouri-Columbia, 201 Engineering Building West, Columbia, MO, 65211-2060, USA
Ron Sun
NEC Research Institute, 4 Independence Way, Princeton, NJ, 08540, USA
C. Lee Giles

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tesauro, G. (2000). Pricing in Agent Economies Using Neural Networks and Multi-agent Q-Learning. In: Sun, R., Giles, C.L. (eds) Sequence Learning. Lecture Notes in Computer Science(), vol 1828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44565-X_13

Download citation

DOI: https://doi.org/10.1007/3-540-44565-X_13
Published: 07 December 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41597-8
Online ISBN: 978-3-540-44565-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics