Skip to main content

Online Learning with Variable Stage Duration

  • Conference paper
Learning Theory (COLT 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4005))

Included in the following conference series:

  • 2707 Accesses

Abstract

We consider online learning in repeated decision problems, within the framework of a repeated game against an arbitrary opponent. For repeated matrix games, well known results establish the existence of no-regret strategies; such strategies secure a long-term average payoff that comes close to the maximal payoff that could be obtained, in hindsight, by playing any fixed action against the observed actions of the opponent. In the present paper we consider the extended model where the duration of each stage of the game may depend on the actions of both players, while the performance measure of interest is the average payoff per unit time. We start the analysis of online learning in repeated games with variable stage duration by showing that no-regret strategies, in the above sense, do not exist in general. Consequently, we consider two classes of adaptive strategies, one based on Blackwell’s approachability theorem and the other on calibrated forecasts, and examine their performance guarantees. In either case we show that the long-term average payoff is higher than a certain function of the empirical distribution of the opponent’s actions, and in particular is strictly higher than the minimax value of the repeated game whenever that empirical distribution deviates from a minimax strategy in the stage game.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  2. Blackwell, D.: An analog of the minimax theorem for vector payoffs. Pacific J. Math. 6(1), 1–8 (1956)

    MathSciNet  MATH  Google Scholar 

  3. Blackwell, D.: Controlled random walks. In: Proc. Int. Congress of Mathematicians 1954, vol. 3, pp. 336–338. North Holland, Amsterdam (1956)

    Google Scholar 

  4. Boyd, S., Vanderberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    MATH  Google Scholar 

  5. Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, New York (2006)

    Book  MATH  Google Scholar 

  6. Foster, D.P., Vohra, R.: Regret in the on-line decision problem. Games and Economic Behavior 29, 7–35 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  7. Foster, D.P., Vohra, R.V.: Calibrated learning and correlated equilibrium. Games and Economic Behavior 21, 40–55 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  8. Foster, D.P., Vohra, R.V.: Asymptotic calibration. Biometrika 85, 379–390 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  9. Freund, Y., Schapire, R.E.: Adaptive game playing using multiplicative weights. Games and Economic Behavior 29, 79–103 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  10. Fudenberg, D., Levine, D.: Universal consistency and cautious fictitious play. Journal of Economic Dynamic and Control 19, 1065–1990 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fudenberg, D., Levine, D.: An easier way to calibrate. Games and Economic Behavior 29, 131–137 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  12. Hannan, J.: Approximation to Bayes Risk in Repeated Play. Contribution to The Theory of Games, vol. III, pp. 97–139. Princeton University Press, Princeton (1957)

    Google Scholar 

  13. Kakade, S.M., Foster, D.P.: Deterministic calibration and nash equilibrium. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 33–48. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Lal, A.A., Sinha, S.: Zero-sum two-person semi-Markov games. J. Appl. Prob. 29, 56–72 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  15. Mannor, S., Shimkin, N.: The empirical Bayes envelope and regret minimization in competitive Markov decision processes. Mathematics of Operations Research 28(2), 327–345 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  16. Mannor, S., Shimkin, N.: Regret minimization in repeated matrix games with variable stage duration. Technical Report EE-1524, Faculty of Electrical Engineering, Technion (February 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mannor, S., Shimkin, N. (2006). Online Learning with Variable Stage Duration. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_31

Download citation

  • DOI: https://doi.org/10.1007/11776420_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35294-5

  • Online ISBN: 978-3-540-35296-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics