Skip to main content

Part of the book series: Wireless Networks ((WN))

  • 738 Accesses

Abstract

In this chapter, we present the formulation, theoretical bound, and algorithms for the stochastic MAB problem. Several important variants of stochastic MAB and their algorithms are also discussed including multiplay MAB, MAB with switching costs, and pure exploration MAB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This type of strategies are also called no-regret policies. But this term is confusing and is thus omitted here.

  2. 2.

    MAB with switching costs can be cast as a restless bandit problem discussed in Chap. 3.

References

  1. Jean-Yves Audibert and Sébastien Bubeck. “Best arm identification in multi-armed bandits”. In: COLT-23th Conference on Learning Theory. 2010, 13–p.

    Google Scholar 

  2. P. Auer, N. C. Bianchi, and P. Fischer. “Finite-time Analysis of the Multiarmed Bandit Problem”. In: Mach. Learn. 47.2-3 (May 2002), pp. 235–256. ISSN: 0885-6125.

    Google Scholar 

  3. Shipra Agrawal and Navin Goyal. “Analysis of Thompson Sampling for the Multi-armed Bandit Problem.” In: 2012.

    Google Scholar 

  4. Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári. “Exploration-exploitation Tradeoff Using Variance Estimates in Multi-armed Bandits”. In: Theor. Comput. Sci. 410.19 (Apr. 2009), pp. 1876–1902. ISSN: 0304-3975.

    Google Scholar 

  5. V. Anantharam, P. Varaiya, and J. Walrand. “Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards”. In: Automatic Control, IEEE Transactions on 32.11 (Nov. 1987), pp. 968–976. ISSN: 0018-9286.

    Google Scholar 

  6. Sébastien Bubeck and Nicol‘o Cesa-Bianchi. “Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems”. In: Foundations and Trends in Machine Learning 5.1 (2012), pp. 1–122.

    Google Scholar 

  7. Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. “Pure exploration in finitely-armed and continuous-armed bandits”. In: Theor. Comput. Sci. 412.19 (2011), pp. 1832–1852.

    Google Scholar 

  8. Jeffrey S Banks and Rangarajan K Sundaram. “Switching costs and the Gittins index”. In: Econometrica 62.3 (1994), pp. 687–694.

    Google Scholar 

  9. Shouyuan Chen et al. “Combinatorial pure exploration of multi-armed bandits”. In: Advances in Neural Information Processing Systems. 2014, pp. 379–387.

    Google Scholar 

  10. Victor Gabillon et al. “Multi-bandit best arm identification”. In: Advances in Neural Information Processing Systems. 2011, pp. 2222–2230.

    Google Scholar 

  11. Y. Gai, B. Krishnamachari, and R. Jain. “Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards and Individual Observations”. In: IEEE/ACM Transactions on Networking 20.5 (Oct. 2012), pp. 1466–1478.

    Google Scholar 

  12. Sudipto Guha and Kamesh Munagala. “Multi-armed bandits with metric switching costs”. In: International Colloquium on Automata, Languages, and Programming. Springer. 2009, pp. 496–507.

    Google Scholar 

  13. Tackseung Jun. “A survey on the bandit problem with switching costs”. In: De Economist 152.4 (2004), pp. 513–541.

    Google Scholar 

  14. Shivaram Kalyanakrishnan et al. “PAC subset selection in stochastic multi-armed bandits”. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12). 2012, pp. 655–662.

    Google Scholar 

  15. Junpei Komiyama, Junya Honda, and Hiroshi Nakagawa. “Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays”. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. 2015, pp. 1152–1161.

    Google Scholar 

  16. Shivaram Kalyanakrishnan and Peter Stone. “Efficient selection of multiple bandit arms: Theory and practice”. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010, pp. 511–518.

    Google Scholar 

  17. T L Lai and H. Robbins. “Asymptotically efficient adaptive allocation rules”. In: Advances in Applied Mathematics 6.1 (1985), pp. 4–22.

    Google Scholar 

  18. Rangarajan K Sundaram. “Generalized bandit problems”. In: Social choice and strategic decisions. Springer, 2005, pp. 131–162.

    Google Scholar 

  19. William R Thompson. “On the likelihood that one unknown probability exceeds another in view of the evidence of two samples”. In: Biometrika 25.3/4 (1933), pp. 285–294.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rong Zheng .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this chapter

Cite this chapter

Zheng, R., Hua, C. (2016). Stochastic Multi-armed Bandit. In: Sequential Learning and Decision-Making in Wireless Resource Management. Wireless Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-50502-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50502-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50501-5

  • Online ISBN: 978-3-319-50502-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics