Skip to main content

Pricing in Agent Economies Using Neural Networks and Multi-agent Q-Learning

  • Chapter
  • First Online:
Sequence Learning

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1828))

Abstract

Reinforcement Learning (RL) procedures have been established as powerful and practical methods for solving Markov Decision Problems. One of the most signicant and actively investigated RL algorithms is Q-learning (Watkins, 1989). Q-learning is an algorithm for learning to estimate the long-term expected reward for a given state-action pair. It has the nice property that it does not need a model of the environment, and it can be used for on-line learning. A number of powerful convergence proofs have been given showing that Q-learning is guaranteed to converge, in cases where the state space is small enough so that lookup table representations can be used (Watkins and Dayan, 1992). Furthermore, in large state spaces where lookup table representations are infeasible, RL methods can be combined with function approximators to give good practical performance despite the lack of theoretical guarantees of convergence to optimal policies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • R. H. Crites and A. G. Barto, (1996). “Improving elevator performance using reinforcement learning.” In: D. Touretzky et al., eds., Advances in Neural Information Processing Systems 8, 1017–1023, MIT Press.

    Google Scholar 

  • A. Greenwald and J. O. Kephart, (1999). “Shopbots and pricebots.” Proceedings of IJCAI-99, 506–511.

    Google Scholar 

  • J. Hu and M. P. Wellman, (1996). “Self-fulfilling bias in multiagent learning.” Proceedings of ICMAS-96, AAAI Press.

    Google Scholar 

  • J. Hu and M. P. Wellman, (1998). “Multiagent reinforcement learning: theoretical framework and an algorithm.” Proceedings of ICML-98, Morgan Kaufmann.

    Google Scholar 

  • J. O. Kephart, J. E. Hanson and J. Sairamesh, (1998). “Price-war dynamics in a free-market economy of software agents.” In: Proceedings of ALIFE-VI, Los Angeles.

    Google Scholar 

  • D. Kreps, (1990). A Course in Microeconomic Theory. Princeton NJ: Princeton University Press.

    Google Scholar 

  • M. L. Littman, (1994). “Markov games as a framework for multi-agent reinforcement learning,” Proceedings of the Eleventh International Conference on Machine Learning, 157–163, Morgan Kaufmann.

    Google Scholar 

  • J. Sairamesh and J. O. Kephart, (1998). “Dynamics of price and quality differentiation in information and computational markets.” Proceedings of the First International Conference on Information and Computation Economics (ICE-98), 28–36, ACM Press.

    Google Scholar 

  • T. W. Sandholm and R. H. Crites, (1995). “On multiagent Q-Learning in a semi-competitive domain.” 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Workshop on Adaptation and Learning in Multiagent Systems, Montreal, Canada, 71–77.

    Google Scholar 

  • P. Stone and M. Veloso, (1999). “Team-partitioned, opaque-transition reinforcement learning.” Proceedings of the Third International Conference on Autonomous Agents, 206–212. New York: ACM Press.

    Chapter  Google Scholar 

  • G. Tesauro, (1995). “Temporal difference learning and TD-Gammon.” Comm. of the ACM, 38:3, 58–67.

    Article  Google Scholar 

  • G. J. Tesauro and J. O. Kephart, (2000). “Foresight-based pricing algorithms in agent economies.” Decision Support Sciences, to appear.

    Google Scholar 

  • J. M. Vidal and E. H. Durfee, (1998). “Learning nested agent models in an information economy,” J. of Experimental and Theoretical AI, 10(3), 291–308.

    Article  MATH  Google Scholar 

  • C. J. C. H. Watkins, (1989). “Learning from delayed rewards.” Doctoral dissertation, Cambridge University.

    Google Scholar 

  • C. J. C. H. Watkins and P. Dayan, (1992). “Q-learning.” Machine Learning 8, 279–292.

    MATH  Google Scholar 

  • W. Zhang and T. G. Dietterich, (1996). “High-performance job-shop scheduling with a time-delay TD(λ) network.” In: D. Touretzky et al., eds., Advances in Neural Information Processing Systems 8, 1024–1030, MIT Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Tesauro, G. (2000). Pricing in Agent Economies Using Neural Networks and Multi-agent Q-Learning. In: Sun, R., Giles, C.L. (eds) Sequence Learning. Lecture Notes in Computer Science(), vol 1828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44565-X_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-44565-X_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41597-8

  • Online ISBN: 978-3-540-44565-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics