Skip to main content

A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem

  • Conference paper
Principles and Practice of Constraint Programming - CP 2006 (CP 2006)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 4204))

Abstract

The max k-armed bandit problem is a recently-introduced online optimization problem with practical applications to heuristic search. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the maximum payoff received over a series of n trials. Previous work on the max k-armed bandit problem has assumed that payoffs are drawn from generalized extreme value (GEV) distributions. In this paper we present a simple algorithm, based on an algorithm for the classical k-armed bandit problem, that solves the max k-armed bandit problem effectively without making strong distributional assumptions. We demonstrate the effectiveness of our approach by applying it to the task of selecting among priority dispatching rules for the resource-constrained project scheduling problem with maximal time lags (RCPSP/max).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002a)

    Article  MATH  Google Scholar 

  2. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002b)

    Article  MATH  MathSciNet  Google Scholar 

  3. Berry, D.A., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London (1986)

    Google Scholar 

  4. Cicirello, V.A., Smith, S.F.: Heuristic selection for stochastic search optimization: Modeling solution quality by extreme value theory. In: Proceedings of the 10th International Conference on Principles and Practice of Constraint Programming, pp. 197–211 (2004)

    Google Scholar 

  5. Cicirello, V.A., Smith, S.F.: The max k-armed bandit: A new model of exploration applied to search heuristic selection. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 1355–1361 (2005)

    Google Scholar 

  6. Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer, London (2001)

    MATH  Google Scholar 

  7. Kaelbling, L.P.: Learning in Embedded Systems. The MIT Press, Cambridge (1993)

    Google Scholar 

  8. Lai, T.L.: Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics 15(3), 1091–1114 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  9. Möhring, R.H., Schulz, A.S., Stork, F., Uetz, M.: Solving project scheduling problems by minimum cut computations. Management Science 49(3), 330–350 (2003)

    Article  Google Scholar 

  10. Neumann, K., Schwindt, C., Zimmerman, J.: Project Scheduling with Time Windows and Scarce Resources. Springer, Heidelberg (2002)

    Google Scholar 

  11. Robbins, H.: Some aspects of sequential design of experiments. Bulletin of the American Mathematical Society 58, 527–535 (1952)

    Article  MATH  MathSciNet  Google Scholar 

  12. Schwindt, C.: Generation of resource–constrained project scheduling problems with minimal and maximal time lags. Technical Report WIOR-489, Universität Karlsruhe (1996)

    Google Scholar 

  13. Streeter, M.J., Smith, S.F.: An asymptotically optimal algorithm for the max k-armed bandit problem. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Streeter, M.J., Smith, S.F. (2006). A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem. In: Benhamou, F. (eds) Principles and Practice of Constraint Programming - CP 2006. CP 2006. Lecture Notes in Computer Science, vol 4204. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11889205_40

Download citation

  • DOI: https://doi.org/10.1007/11889205_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46267-5

  • Online ISBN: 978-3-540-46268-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics