Online Sparse Bandit for Card Games

St-Pierre, David L.; Louveaux, Quentin; Teytaud, Olivier

doi:10.1007/978-3-642-31866-5_25

David L. St-Pierre¹⁷,
Quentin Louveaux¹⁷ &
Olivier Teytaud^18,19

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7168))

Included in the following conference series:

Advances in Computer Games

1759 Accesses
7 Citations

Abstract

Finding an approximation of a Nash equilibrium in matrix games is an important topic that reaches beyond the strict application to matrix games. A bandit algorithm commonly used to approximate a Nash equilibrium is EXP3 [3]. However, the solution to many problems is often sparse, yet EXP3 inherently fails to exploit this property. To the best knowledge of the authors, there exist only an offline truncation proposed by [9] to handle such issue. In this paper, we propose a variation of EXP3 to exploit the fact that a solution is sparse by dynamically removing arms; the resulting algorithm empirically performs better than previous versions. We apply the resulting algorithm to an MCTS program for the Urban Rivals card game.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Audibert, J.Y., Bubeck, S.: Minimax policies for adversarial and stochastic bandits. In: 22nd Annual Conference on Learning Theory (COLT), Montreal (June 2009)
Google Scholar
Audibert, J.-Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research (October 2010)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos (1995)
Google Scholar
Auger, D.: Multiple Tree for Partially Observable Monte-Carlo Tree Search. In: Di Chio, C., Cagnoni, S., Cotta, C., Ebner, M., Ekárt, A., Esparcia-Alcázar, A.I., Merelo, J.J., Neri, F., Preuss, M., Richter, H., Togelius, J., Yannakakis, G.N. (eds.) EvoApplications 2011, Part I. LNCS, vol. 6624, pp. 53–62. Springer, Heidelberg (2011)
Chapter Google Scholar
Grigoriadis, M.D., Khachiyan, L.G.: A sublinear-time randomized approximation algorithm for matrix games. Operations Research Letters 18(2), 53–58 (1995)
Article MathSciNet MATH Google Scholar
Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6(1), 4–22 (1985)
Article MathSciNet MATH Google Scholar
Lanctot, M., Waugh, K., Zinkevich, M., Bowling, M.: Monte Carlo Sampling for Regret Minimization in Extensive Games. Advances in Neural Information Processing Systems 22, 1078–1086 (2009)
Google Scholar
Ponsen, M., Lanctot, M., de Jong, S.: MCRNR: Fast Computing of Restricted Nash Responses by Means of Sampling. In: Proceedings of Interactive Decision Theory and Game Theory Workshop, AAAI 2010 (2010)
Google Scholar
Teytaud, O., Flory, S.: Upper Confidence Trees with Short Term Partial Information. In: Di Chio, C., Cagnoni, S., Cotta, C., Ebner, M., Ekárt, A., Esparcia-Alcázar, A.I., Merelo, J.J., Neri, F., Preuss, M., Richter, H., Togelius, J., Yannakakis, G.N. (eds.) EvoApplications 2011, Part I. LNCS, vol. 6624, pp. 153–162. Springer, Heidelberg (2011)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, Faculty of Engineering, Liège University, Belgium
David L. St-Pierre & Quentin Louveaux
TAO (Inria, Lri, Univ. Paris-Sud, UMR CNRS 8623), France
Olivier Teytaud
OASE Lab., National University of Tainan, Taiwan
Olivier Teytaud

Authors

David L. St-Pierre
View author publications
You can also search for this author in PubMed Google Scholar
Quentin Louveaux
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Teytaud
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tilburg Institute of Cognition and Communication, Tilburg University, Warandelaan 2, 5037 AB, Tilburg, The Netherlands
H. Jaap van den Herik & Aske Plaat &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

St-Pierre, D.L., Louveaux, Q., Teytaud, O. (2012). Online Sparse Bandit for Card Games. In: van den Herik, H.J., Plaat, A. (eds) Advances in Computer Games. ACG 2011. Lecture Notes in Computer Science, vol 7168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31866-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-31866-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31865-8
Online ISBN: 978-3-642-31866-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics