Abstract
This paper investigates learning-based agents that are capable of mimicking human behavior in game playing, a central task in computational economics. Although computational economists have developed various game-playing agents, well-established machine learning methods such as graphical models have not been applied before. Leveraging probabilistic graphical models, this paper presents a novel sequential Bayesian network (SBN) framework for building artificial game-playing agents. We show that many existing agents, including reinforcement learning, fictitious play, and many of their variants, have a unified Bayesian explanation within the proposed SBN framework. Moreover, we discover that SBN can handle various important settings of game playing, allowing for a broad scope of its use in economics. SBN not only provides a unifying and satisfying framework to explain existing learning approaches in virtual economies, but also enables the development of new algorithms that are stronger or have fewer restrictions. In this paper, we derive a new algorithm, Hidden Markovian Play (HMP), from the generic SBN model to handle an important but difficult setting in which a player cannot observe the opponent’s strategy and payoff. It leverages Markovian learning to infer unobservable information, leading to higher quality of the agents. Experiments on real-world field experiments in evaluating economies show that our HMP model outperforms the baseline algorithms for building artificial agents.
Similar content being viewed by others
References
Airiau, S., Endriss, U.: Multiagent resource allocation with sharable items: Simple protocols and nash equilibria In: Proceedings of the 9th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-2010) (2010)
Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 121–164 (2012)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer-Verlag, New York, Inc. Secaucus, NJ (2006)
Bonabeau, E.: Agent-Based Modeling: Methods and Techniques for Simulating Human Systems. Proc. Natl. Acad. Sci. U. S. A. 99(10), 7280–7287 (2002)
Boutilier, C., Shoham, Y., Wellman, M.P.: Economic principles of multi-agent systems. Artif. Intell. 94(1–2), 1–6 (1997)
Bowling, M., Veloso, M.: Existence of multiagent equilibria with limited agents. J. Artif. Intell. Res. 22(1), 353–384 (2004) http://dl.acm.org/citation.cfm?id=1622487. 1622498
Camerer, C., hua Ho, T.: Experience-weighted attraction learning in normal form games. Econometrica 67, 827–874 (1999)
Camerer, C.F., Ho, T.H., Chong, J.K.: A cognitive hierarchy model of games. Q. J. Econ. 119(3), 861–898 (2004)
Carmel, D., Markovitch, S.: Learning models of intelligent agents. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence pp 62–67 Portland Oregon (1996)
Chen, Y., Lai, J., Parkes, D.C., Procaccia, A.D.: Truth, justice, and cake cutting. In: Fox, M., Poole, D. (eds.) AAAI 2010, Atlanta, Georgia, USA, July 11–15, 2010. AAAI Press (2010)
Chen, Y., Vaughan, J.W.: A new understanding of prediction markets via no-regret learning In: Proceedings of he 11th ACM conference on Electronic commerce, EC ’10 pp 189–198. ACM, New York, NY (2010)
Crandall, J.W., Ahmed, A., Goodrich, M.A.: In: Burgard, W., Roth, D. (eds.) : Learning in repeated games with minimal information: The effects of learning bias. AAAI. AAAI Press (2011)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Royal Statist. Soc. Series B (Methodological) 39(1), 1–38 (1977)
Devaine, M., Hollard, G., Daunizeau, J.: Theory of mind: did evolution fool us. PloS. One. 9(2), e87619 (2014)
Dimicco, J.M., Greenwald, A., Maes, P.: Learning curve: A simulation-based approach to dynamic pricing. J. Electron. Commer. Res. 3, 245–276 (2003)
Doshi, P., Qu, X., Goodie, A., Young, D.: Modeling recursive reasoning by humans using empirically informed interactive pomdps In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, AAMAS ’10, pp.1223–1230. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2010)
Elkind, E., Golberg, L.A., Goldberg, P.W.: Computing good nash equilibria in graphical games Proceedings of the 8th ACM conference on Electronic commerce, EC ’07, pp 162–171. ACM, New York, NY USA (2007)
Elkind, E., Leyton-Brown, K.: Algorithmic game theory and artificial intelligence. Artif. Intell. Mag. 31(4), 9–12 (2010)
Erev, I.: On surprise, change, and the effect of recent outcomes. Front. Psychol. 3(0) (2012)
Erev, I., Bereby-Meyer, Y., Roth, A.E.: The effect of adding a constant to all payoffs: experimental investigation, and implications for reinforcement learning models. J. Econ. Behav. & Organ. 39(1), 111–128 (1999)
Erev, I., Ert, E., Roth, A.E.: A choice prediction competition for market entry games: An introduction. Games 1(2), 117–136 (2010). doi:10.3390/g1020117
Erev, I., Ert, E., Roth, A.E., Haruvy, E., Herzog, S., Hau, R., Hertwig, R., Steward, T., West, R., Lebiere, C.: A choice prediction competition, for choices from experience and from description. J. Behav. Decis. Mak. 23, 15–47 (2010)
Erev, I., Roth, A., Slonim, R., Barron, G.: Learning and equilibrium as useful approximations: Accuracy of prediction on randomly selected constant sum games. Econ. Theory 33(1), 29–51 (2007)
Erev, I., Roth, A.E.: Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Am. Econ. Rev. 88(4), 848–81 (1998)
Erev, I., Roth, A.E., Slonim, R.L., Barron, G.: Predictive value and the usefulness of game theoretic models. Int. J. Forecast. 18(3), 359–368 (2002)
Ert, E., Erev, I.: Replicated alternatives and the role of confusion, chasing, and regret in decisions from experience. J. Behav. Decis. Mak. 322, 305–322 (2007)
Fudenberg, D., Levine, D.K.: The Theory of Learning in Games, MIT Press Books, vol. 1. The MIT Press (1998)
Fudenberg, D., Levine, D.K.: Conditional universal consistency. Games Econ. Behav. 29(1–2), 104–130 (1999)
Ho, T.H., Camerer, C.F., Chong, J.K.: Self-tuning experience weighted attraction learning in games. J. Econ. Theory 133(1), 177–198 (2007)
Hu, J., Wellman, M.P.: Nash q-learning for general-sum stochastic games. J. Mach. Learn. Res. 4, 1039–1069 (2003)
Jafari, A., Greenwald, A.R., Gondek, D., Ercal, G.: On no-regret learning, fictitious play, and nash equilibrium In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, pp 226–233. Morgan Kaufmann Publishers Inc., San Francisco, CA (2001)
Kearns, M., Littman, M.L., Singh, S.: Graphical models for game theory In: Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence (2001)
Kleinberg, R., Piliouras, G., Tardos, É.: Multiplicative updates outperform generic no-regret learning in congestion games In: Proceedings of the forty-first annual ACM symposium on Theory of computing (2009)
Koller, D., Milch, B.: Multi-agent influence diagrams for representing and solving games. Games Econ. Behav. 45(1), 181–221 (2003)
Levitt, S.D., List, J.A.: Field experiments in economics: The past, the present, and the future. NBER Workingc Papers 14356 National Bureau of Economic Research Inc (2008)
Ng, B., Boakye, K., Meyers, C., Wang, A.: Bayes-adaptive interactive pomdps AAAI (2012)
Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc., San Francisco, CA (1988)
Rezek, I., Leslie, D.S., Reece, S., Roberts, S.J., Rogers, A., Dash, R.K., Jennings, N.R.: On similarities between inference in game theory and machine learning. J. Artif. Intell. Res. 33(1), 259–283 (2008)
Shoham, Y., Powers, R., Grenager, T.: If multi-agent learning is the answer, what is the question?. Artif. Intell. 171(7), 365–377 (2007). doi:10.1016/j.artint.2006.02.006
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge (1989)
Waugh, K., Bagnell, D., Ziebart, B.D.: Computational rationalization: The inverse equilibrium problem. In: Proceedings of the 28th International Conference on Machine Learning, pp 1169–1176, New York, NY (2011)
Wright, J.R., Leyton-Brown, K.: Behavioral game theoretic models: a bayesian framework for parameter analysis In; Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 (2012)
Yang, R., Ordonez, F., Tambe, M.: Computing optimal strategy against quantal response in security games In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 (2012)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, W., Chen, Y. & Levine, D.K. A unifying learning framework for building artificial game-playing agents. Ann Math Artif Intell 73, 335–358 (2015). https://doi.org/10.1007/s10472-015-9450-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-015-9450-1