Abstract
Reinforcement Learning (RL) procedures have been established as powerful and practical methods for solving Markov Decision Problems. One of the most signicant and actively investigated RL algorithms is Q-learning (Watkins, 1989). Q-learning is an algorithm for learning to estimate the long-term expected reward for a given state-action pair. It has the nice property that it does not need a model of the environment, and it can be used for on-line learning. A number of powerful convergence proofs have been given showing that Q-learning is guaranteed to converge, in cases where the state space is small enough so that lookup table representations can be used (Watkins and Dayan, 1992). Furthermore, in large state spaces where lookup table representations are infeasible, RL methods can be combined with function approximators to give good practical performance despite the lack of theoretical guarantees of convergence to optimal policies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. H. Crites and A. G. Barto, (1996). “Improving elevator performance using reinforcement learning.” In: D. Touretzky et al., eds., Advances in Neural Information Processing Systems 8, 1017–1023, MIT Press.
A. Greenwald and J. O. Kephart, (1999). “Shopbots and pricebots.” Proceedings of IJCAI-99, 506–511.
J. Hu and M. P. Wellman, (1996). “Self-fulfilling bias in multiagent learning.” Proceedings of ICMAS-96, AAAI Press.
J. Hu and M. P. Wellman, (1998). “Multiagent reinforcement learning: theoretical framework and an algorithm.” Proceedings of ICML-98, Morgan Kaufmann.
J. O. Kephart, J. E. Hanson and J. Sairamesh, (1998). “Price-war dynamics in a free-market economy of software agents.” In: Proceedings of ALIFE-VI, Los Angeles.
D. Kreps, (1990). A Course in Microeconomic Theory. Princeton NJ: Princeton University Press.
M. L. Littman, (1994). “Markov games as a framework for multi-agent reinforcement learning,” Proceedings of the Eleventh International Conference on Machine Learning, 157–163, Morgan Kaufmann.
J. Sairamesh and J. O. Kephart, (1998). “Dynamics of price and quality differentiation in information and computational markets.” Proceedings of the First International Conference on Information and Computation Economics (ICE-98), 28–36, ACM Press.
T. W. Sandholm and R. H. Crites, (1995). “On multiagent Q-Learning in a semi-competitive domain.” 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Workshop on Adaptation and Learning in Multiagent Systems, Montreal, Canada, 71–77.
P. Stone and M. Veloso, (1999). “Team-partitioned, opaque-transition reinforcement learning.” Proceedings of the Third International Conference on Autonomous Agents, 206–212. New York: ACM Press.
G. Tesauro, (1995). “Temporal difference learning and TD-Gammon.” Comm. of the ACM, 38:3, 58–67.
G. J. Tesauro and J. O. Kephart, (2000). “Foresight-based pricing algorithms in agent economies.” Decision Support Sciences, to appear.
J. M. Vidal and E. H. Durfee, (1998). “Learning nested agent models in an information economy,” J. of Experimental and Theoretical AI, 10(3), 291–308.
C. J. C. H. Watkins, (1989). “Learning from delayed rewards.” Doctoral dissertation, Cambridge University.
C. J. C. H. Watkins and P. Dayan, (1992). “Q-learning.” Machine Learning 8, 279–292.
W. Zhang and T. G. Dietterich, (1996). “High-performance job-shop scheduling with a time-delay TD(λ) network.” In: D. Touretzky et al., eds., Advances in Neural Information Processing Systems 8, 1024–1030, MIT Press.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Tesauro, G. (2000). Pricing in Agent Economies Using Neural Networks and Multi-agent Q-Learning. In: Sun, R., Giles, C.L. (eds) Sequence Learning. Lecture Notes in Computer Science(), vol 1828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44565-X_13
Download citation
DOI: https://doi.org/10.1007/3-540-44565-X_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41597-8
Online ISBN: 978-3-540-44565-4
eBook Packages: Springer Book Archive