Skip to main content

Self-Practice Imitation Learning from Weak Policy

  • Conference paper
  • First Online:
Partially Supervised Learning (PSL 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8183))

Included in the following conference series:

  • 797 Accesses

Abstract

Imitation learning is an effective strategy to reinforcement learning, which avoids the delayed reward problem by learning from mentor-demonstrated trajectories. A limitation for imitation learning is that collecting sufficient qualified demonstrations is quite expensive. In this work, we study how an agent can automatically improve its performance from a weak policy, by automatically acquiring more demonstrations for learning. We propose the LEWE framework to sample tasks for the weak policy to execute, and then learn from the successful trajectories to achieve an improvement. As the sampling strategy is the key to the efficiency of LEWE, we further propose to incorporate active learning for the sampling strategy for LEWE. Experiments in a spatial positioning task show that LEWE with active learning can effectively and efficiently improve the weak policy and achieves a better performance than the comparing sampling approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sutton, R., Barto, A.: Reinforcement Learning. An Introduction. Cambridge University Press, Cambridge (1998)

    Google Scholar 

  2. Schaal, S.: Is imitation learning the route to humanoid robots. Trends Cogn. Sci. 3(6), 233–242 (1999)

    Article  Google Scholar 

  3. Argall, B., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Rob. Auton. Syst. 57(5), 469–483 (2009)

    Article  Google Scholar 

  4. Atkeson, C., Schaal, S.: Robot learning from demonstration. In: Proceedings of the ICML’97, San Francisco, USA, pp. 12–20, July 1997

    Google Scholar 

  5. Choi, J., Kim, K.: Inverse reinforcement learning in partially observable environments. In: Proceedings of IJCAI’09, Barcelona, Spain, pp. 1028–1033, July 2009

    Google Scholar 

  6. Jetchev, N., Toussaint, M.: Task space retrieval using inverse feedback control. In: Proceedings of ICML’11, Bellevue, WA, USA, pp. 449–456, June 2011

    Google Scholar 

  7. Zhang, D., Cai, Z., Nebel, B.: Playing tetris using learning by imitation. In: Proceedings of GAMEON’10, Leicester, UK, pp. 23–27, November 2010

    Google Scholar 

  8. Ng, A., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of ICML’00, Stanford, USA, pp. 663–670, June 2000

    Google Scholar 

  9. Ziebart, B., Maas, A., Bagnell, J., Dey, A.: Maximum entropy inverse reinforcement learning. In: Proceedings of AAAI’08, Chicago, USA, pp. 1433–1438, July 2008

    Google Scholar 

  10. Bentivegna, D.: Learning from Observation Using Primitives. Ph.D. thesis, College of Computing, Georgia Institute of Technology (2011)

    Google Scholar 

  11. Bentivegna, D., Atkeson, C.: Learning from observation using primitives. In: Proceedings of ICRA’11, Seoul, Korea, pp. 1988–1993, May 2001

    Google Scholar 

  12. Silver, D., Bagnell, J., Stentz, A.: Perceptual interpretation for autonomous navigation through dynamic imitation learning. Robot. Res. 70, 433–449 (2011)

    Article  Google Scholar 

  13. Settles, B.: Active learning literature survey. Computer Sciences Technical Report, University of Wisconsin-Madison (2009)

    Google Scholar 

  14. Huang, S., Jin, R., Zhou, Z.: Active learning by querying informative and representative examples. In: NIPS’11, pp. 892–900 (2011)

    Google Scholar 

  15. Beyer, H., Schwefel, H.: Evolution strategies-a comprehensive introduction. Nat. Comput. 1(1), 3–52 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  16. Argall, B., Browning, B., Veloso, M.: Learning robot motion control with demonstration and advice-operators. In: Proceedings of IROS’08, Nice, France, pp. 399–404, September 2008

    Google Scholar 

  17. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)

    MATH  Google Scholar 

  18. Quinlan, J.: C4.5: Programs for machine learning. Morgan kaufmann, San Franscisco (1993)

    Google Scholar 

  19. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  20. Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Franscisco (2005)

    Google Scholar 

  21. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2000)

    Book  MATH  Google Scholar 

  22. Chang, C., Lin, C.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syss. Technol. 2(3), 27 (2011)

    Google Scholar 

  23. Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of ICML’06, Pittsburgh, PE, pp. 161–168 (2006)

    Google Scholar 

Download references

Acknowledgments

This research was supported by the Jiangsu Science Foundation (BK2012303), the 2013 State Grid Research Project, and the Baidu Fund (181315P00651).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Da, Q., Yu, Y., Zhou, ZH. (2013). Self-Practice Imitation Learning from Weak Policy. In: Zhou, ZH., Schwenker, F. (eds) Partially Supervised Learning. PSL 2013. Lecture Notes in Computer Science(), vol 8183. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40705-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40705-5_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40704-8

  • Online ISBN: 978-3-642-40705-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics