Skip to main content

Extending Sliding-Step Importance Weighting from Supervised Learning to Reinforcement Learning

  • Conference paper
  • First Online:
Artificial Intelligence. IJCAI 2019 International Workshops (IJCAI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12158))

Included in the following conference series:

Abstract

Stochastic gradient descent (SGD) has been in the center of many advances in modern machine learning. SGD processes examples sequentially, updating a weight vector in the direction that would most reduce the loss for that example. In many applications, some examples are more important than others and, to capture this, each example is given a non-negative weight that modulates its impact. Unfortunately, if the importance weights are highly variable they can greatly exacerbate the difficulty of setting the step-size parameter of SGD. To ease this difficulty, Karampatziakis and Langford [6] developed a class of elegant algorithms that are much more robust in the face of highly variable importance weights in supervised learning. In this paper we extend their idea, which we call “sliding step”, to reinforcement learning, where importance weighting can be particularly variable due to the importance sampling involved in off-policy learning algorithms. We compare two alternative ways of doing the extension in the linear function approximation setting, then introduce specific sliding-step versions of the TD(0) and Emphatic TD(0) learning algorithms. We prove the convergence of our algorithms and demonstrate their effectiveness on both on-policy and off-policy problems. Overall, our new algorithms appear to be effective in bringing the robustness of the sliding-step technique from supervised learning to reinforcement learning.

2nd Scaling-Up Reinforcement Learning (SURL) Workshop, IJCAI 2019.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Beygelzimer, A., Dasgupta, S., Langford, J.: Importance weighted active learning. In: Proceedings of the 26th International Conference on Machine Learning (ICML), pp. 49–56 (2009)

    Google Scholar 

  2. Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Mach. Learn. 22, 33–57 (1996)

    MATH  Google Scholar 

  3. Freund, Y., Schapire, R.E.A.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1995)

    MathSciNet  MATH  Google Scholar 

  4. Ghiassian, S., Patterson, A., White, M., Sutton, S. R., White, A.: Online off-policy prediction. ArXiv:1811.02597 (2018)

  5. Huang, J., Alexander, J.S., Arthur, G., Karsten, M.B., Bernhard, S.: Correcting sample selection bias by unlabeled data. Adv. Neural Inf. Process. Syst. 19, 601–608 (2006)

    Google Scholar 

  6. Karampatziakis, N., Langford, J.: Online importance weight aware updates. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 392–399 (2011)

    Google Scholar 

  7. Precup, D., Sutton, R.S., Dasgupta, S.: Off-Policy temporal-difference learning with function approximation. In: Proceedings of the 18th International Conference on Machine Learning (ICML), pp. 417–424 (2001)

    Google Scholar 

  8. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    MathSciNet  MATH  Google Scholar 

  9. Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)

    Google Scholar 

  10. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  11. Sutton, R.S., et al.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pp. 993–1000 (2009)

    Google Scholar 

  12. Sutton, R.S., Maei, H.R., Szepesvári, C.: A convergent O(n) temporal-difference algorithm for off-policy learning with linear function approximation. In: Advances in Neural Information Processing Systems 21 (NIPS), pp. 1609–1616 (2008)

    Google Scholar 

  13. Sutton, R.S., Mahmood, R.A., White, M.: An emphatic approach to the problem of off-policy temporal-difference learning. J. Mach. Learn. Res. 17(73), 1–29 (2016)

    MathSciNet  MATH  Google Scholar 

  14. Tsitsiklis, J.N.: Asynchronous stochastic approximation and Q-Learning. Mach. Learn. 16(3), 185–202 (1994)

    MATH  Google Scholar 

  15. Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control 42(5), 674–690 (1997)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tian Tian .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tian, T., Sutton, R.S. (2020). Extending Sliding-Step Importance Weighting from Supervised Learning to Reinforcement Learning. In: El Fallah Seghrouchni, A., Sarne, D. (eds) Artificial Intelligence. IJCAI 2019 International Workshops. IJCAI 2019. Lecture Notes in Computer Science(), vol 12158. Springer, Cham. https://doi.org/10.1007/978-3-030-56150-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-56150-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-56149-9

  • Online ISBN: 978-3-030-56150-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics