Online Learning with Variable Stage Duration

Mannor, Shie; Shimkin, Nahum

doi:10.1007/11776420_31

Shie Mannor²⁰ &
Nahum Shimkin²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4005))

Included in the following conference series:

International Conference on Computational Learning Theory

2707 Accesses

Abstract

We consider online learning in repeated decision problems, within the framework of a repeated game against an arbitrary opponent. For repeated matrix games, well known results establish the existence of no-regret strategies; such strategies secure a long-term average payoff that comes close to the maximal payoff that could be obtained, in hindsight, by playing any fixed action against the observed actions of the opponent. In the present paper we consider the extended model where the duration of each stage of the game may depend on the actions of both players, while the performance measure of interest is the average payoff per unit time. We start the analysis of online learning in repeated games with variable stage duration by showing that no-regret strategies, in the above sense, do not exist in general. Consequently, we consider two classes of adaptive strategies, one based on Blackwell’s approachability theorem and the other on calibrated forecasts, and examine their performance guarantees. In either case we show that the long-term average payoff is higher than a certain function of the empirical distribution of the opponent’s actions, and in particular is strictly higher than the minimax value of the repeated game whenever that empirical distribution deviates from a minimax strategy in the stage game.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002)
Article MathSciNet MATH Google Scholar
Blackwell, D.: An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1), 1–8 (1956)
MathSciNet MATH Google Scholar
Blackwell, D.: Controlled random walks. In: Proc. Int. Congress of Mathematicians 1954, vol. 3, pp. 336–338. North Holland, Amsterdam (1956)
Google Scholar
Boyd, S., Vanderberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
MATH Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)
Book MATH Google Scholar
Foster, D.P., Vohra, R.: Regret in the on-line decision problem. Games and Economic Behavior 29, 7–35 (1999)
Article MathSciNet MATH Google Scholar
Foster, D.P., Vohra, R.V.: Calibrated learning and correlated equilibrium. Games and Economic Behavior 21, 40–55 (1997)
Article MathSciNet MATH Google Scholar
Foster, D.P., Vohra, R.V.: Asymptotic calibration. Biometrika 85, 379–390 (1998)
Article MathSciNet MATH Google Scholar
Freund, Y., Schapire, R.E.: Adaptive game playing using multiplicative weights. Games and Economic Behavior 29, 79–103 (1999)
Article MathSciNet MATH Google Scholar
Fudenberg, D., Levine, D.: Universal consistency and cautious fictitious play. Journal of Economic Dynamic and Control 19, 1065–1990 (1995)
Article MathSciNet MATH Google Scholar
Fudenberg, D., Levine, D.: An easier way to calibrate. Games and Economic Behavior 29, 131–137 (1999)
Article MathSciNet MATH Google Scholar
Hannan, J.: Approximation to Bayes Risk in Repeated Play. Contribution to The Theory of Games, vol. III, pp. 97–139. Princeton University Press, Princeton (1957)
Google Scholar
Kakade, S.M., Foster, D.P.: Deterministic calibration and nash equilibrium. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 33–48. Springer, Heidelberg (2004)
Chapter Google Scholar
Lal, A.A., Sinha, S.: Zero-sum two-person semi-Markov games. J. Appl. Prob. 29, 56–72 (1992)
Article MathSciNet MATH Google Scholar
Mannor, S., Shimkin, N.: The empirical Bayes envelope and regret minimization in competitive Markov decision processes. Mathematics of Operations Research 28(2), 327–345 (2003)
Article MathSciNet MATH Google Scholar
Mannor, S., Shimkin, N.: Regret minimization in repeated matrix games with variable stage duration. Technical Report EE-1524, Faculty of Electrical Engineering, Technion (February 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engingeering, McGill University, Québec, H3A-2A7
Shie Mannor
Department of Electrical Engineering, Technion, Israel Institute of Technology, Haifa, 32000, Israel
Nahum Shimkin

Authors

Shie Mannor
View author publications
You can also search for this author in PubMed Google Scholar
Nahum Shimkin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICREA and Department of Economics, Universitat Pompeu Fabra, Ramon Trias Fargas 25-27, 08005, Barcelona, Spain
Gábor Lugosi
Ruhr-Universität Bochum, Germany
Hans Ulrich Simon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mannor, S., Shimkin, N. (2006). Online Learning with Variable Stage Duration. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_31

Download citation

DOI: https://doi.org/10.1007/11776420_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35294-5
Online ISBN: 978-3-540-35296-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics