Self-Practice Imitation Learning from Weak Policy

Da, Qing; Yu, Yang; Zhou, Zhi-Hua

doi:10.1007/978-3-642-40705-5_2

Qing Da⁶,
Yang Yu⁶ &
Zhi-Hua Zhou⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8183))

Included in the following conference series:

IAPR International Workshop on Partially Supervised Learning

797 Accesses

Abstract

Imitation learning is an effective strategy to reinforcement learning, which avoids the delayed reward problem by learning from mentor-demonstrated trajectories. A limitation for imitation learning is that collecting sufficient qualified demonstrations is quite expensive. In this work, we study how an agent can automatically improve its performance from a weak policy, by automatically acquiring more demonstrations for learning. We propose the LEWE framework to sample tasks for the weak policy to execute, and then learn from the successful trajectories to achieve an improvement. As the sampling strategy is the key to the efficiency of LEWE, we further propose to incorporate active learning for the sampling strategy for LEWE. Experiments in a spatial positioning task show that LEWE with active learning can effectively and efficiently improve the weak policy and achieves a better performance than the comparing sampling approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sutton, R., Barto, A.: Reinforcement Learning. An Introduction. Cambridge University Press, Cambridge (1998)
Google Scholar
Schaal, S.: Is imitation learning the route to humanoid robots. Trends Cogn. Sci. 3(6), 233–242 (1999)
Article Google Scholar
Argall, B., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Rob. Auton. Syst. 57(5), 469–483 (2009)
Article Google Scholar
Atkeson, C., Schaal, S.: Robot learning from demonstration. In: Proceedings of the ICML’97, San Francisco, USA, pp. 12–20, July 1997
Google Scholar
Choi, J., Kim, K.: Inverse reinforcement learning in partially observable environments. In: Proceedings of IJCAI’09, Barcelona, Spain, pp. 1028–1033, July 2009
Google Scholar
Jetchev, N., Toussaint, M.: Task space retrieval using inverse feedback control. In: Proceedings of ICML’11, Bellevue, WA, USA, pp. 449–456, June 2011
Google Scholar
Zhang, D., Cai, Z., Nebel, B.: Playing tetris using learning by imitation. In: Proceedings of GAMEON’10, Leicester, UK, pp. 23–27, November 2010
Google Scholar
Ng, A., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of ICML’00, Stanford, USA, pp. 663–670, June 2000
Google Scholar
Ziebart, B., Maas, A., Bagnell, J., Dey, A.: Maximum entropy inverse reinforcement learning. In: Proceedings of AAAI’08, Chicago, USA, pp. 1433–1438, July 2008
Google Scholar
Bentivegna, D.: Learning from Observation Using Primitives. Ph.D. thesis, College of Computing, Georgia Institute of Technology (2011)
Google Scholar
Bentivegna, D., Atkeson, C.: Learning from observation using primitives. In: Proceedings of ICRA’11, Seoul, Korea, pp. 1988–1993, May 2001
Google Scholar
Silver, D., Bagnell, J., Stentz, A.: Perceptual interpretation for autonomous navigation through dynamic imitation learning. Robot. Res. 70, 433–449 (2011)
Article Google Scholar
Settles, B.: Active learning literature survey. Computer Sciences Technical Report, University of Wisconsin-Madison (2009)
Google Scholar
Huang, S., Jin, R., Zhou, Z.: Active learning by querying informative and representative examples. In: NIPS’11, pp. 892–900 (2011)
Google Scholar
Beyer, H., Schwefel, H.: Evolution strategies-a comprehensive introduction. Nat. Comput. 1(1), 3–52 (2002)
Article MathSciNet MATH Google Scholar
Argall, B., Browning, B., Veloso, M.: Learning robot motion control with demonstration and advice-operators. In: Proceedings of IROS’08, Nice, France, pp. 399–404, September 2008
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar
Quinlan, J.: C4.5: Programs for machine learning. Morgan kaufmann, San Franscisco (1993)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Franscisco (2005)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2000)
Book MATH Google Scholar
Chang, C., Lin, C.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syss. Technol. 2(3), 27 (2011)
Google Scholar
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of ICML’06, Pittsburgh, PE, pp. 161–168 (2006)
Google Scholar

Download references

Acknowledgments

This research was supported by the Jiangsu Science Foundation (BK2012303), the 2013 State Grid Research Project, and the Baidu Fund (181315P00651).

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210046, China
Qing Da, Yang Yu & Zhi-Hua Zhou

Authors

Qing Da
View author publications
You can also search for this author in PubMed Google Scholar
Yang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Yu .

Editor information

Editors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu, People's Republic of China
Zhi-Hua Zhou
Abt. Neuroinformatik, Universität Ulm, Ulm, Germany
Friedhelm Schwenker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Da, Q., Yu, Y., Zhou, ZH. (2013). Self-Practice Imitation Learning from Weak Policy. In: Zhou, ZH., Schwenker, F. (eds) Partially Supervised Learning. PSL 2013. Lecture Notes in Computer Science(), vol 8183. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40705-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-40705-5_2
Published: 22 October 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40704-8
Online ISBN: 978-3-642-40705-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics