A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem

Streeter, Matthew J.; Smith, Stephen F.

doi:10.1007/11889205_40

Matthew J. Streeter¹⁷ &
Stephen F. Smith¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 4204))

Included in the following conference series:

International Conference on Principles and Practice of Constraint Programming

1302 Accesses
12 Citations

Abstract

The max k-armed bandit problem is a recently-introduced online optimization problem with practical applications to heuristic search. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the maximum payoff received over a series of n trials. Previous work on the max k-armed bandit problem has assumed that payoffs are drawn from generalized extreme value (GEV) distributions. In this paper we present a simple algorithm, based on an algorithm for the classical k-armed bandit problem, that solves the max k-armed bandit problem effectively without making strong distributional assumptions. We demonstrate the effectiveness of our approach by applying it to the task of selecting among priority dispatching rules for the resource-constrained project scheduling problem with maximal time lags (RCPSP/max).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002a)
Article MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002b)
Article MATH MathSciNet Google Scholar
Berry, D.A., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London (1986)
Google Scholar
Cicirello, V.A., Smith, S.F.: Heuristic selection for stochastic search optimization: Modeling solution quality by extreme value theory. In: Proceedings of the 10th International Conference on Principles and Practice of Constraint Programming, pp. 197–211 (2004)
Google Scholar
Cicirello, V.A., Smith, S.F.: The max k-armed bandit: A new model of exploration applied to search heuristic selection. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 1355–1361 (2005)
Google Scholar
Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer, London (2001)
MATH Google Scholar
Kaelbling, L.P.: Learning in Embedded Systems. The MIT Press, Cambridge (1993)
Google Scholar
Lai, T.L.: Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics 15(3), 1091–1114 (1987)
Article MATH MathSciNet Google Scholar
Möhring, R.H., Schulz, A.S., Stork, F., Uetz, M.: Solving project scheduling problems by minimum cut computations. Management Science 49(3), 330–350 (2003)
Article Google Scholar
Neumann, K., Schwindt, C., Zimmerman, J.: Project Scheduling with Time Windows and Scarce Resources. Springer, Heidelberg (2002)
Google Scholar
Robbins, H.: Some aspects of sequential design of experiments. Bulletin of the American Mathematical Society 58, 527–535 (1952)
Article MATH MathSciNet Google Scholar
Schwindt, C.: Generation of resource–constrained project scheduling problems with minimal and maximal time lags. Technical Report WIOR-489, Universität Karlsruhe (1996)
Google Scholar
Streeter, M.J., Smith, S.F.: An asymptotically optimal algorithm for the max k-armed bandit problem. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department and Center for the Neural Basis of Cognition,
Matthew J. Streeter
The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 15213
Stephen F. Smith

Authors

Matthew J. Streeter
View author publications
You can also search for this author in PubMed Google Scholar
Stephen F. Smith
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LINA UMR CNRS 6241, University of Nantes,
Frédéric Benhamou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Streeter, M.J., Smith, S.F. (2006). A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem. In: Benhamou, F. (eds) Principles and Practice of Constraint Programming - CP 2006. CP 2006. Lecture Notes in Computer Science, vol 4204. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11889205_40

Download citation

DOI: https://doi.org/10.1007/11889205_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46267-5
Online ISBN: 978-3-540-46268-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics