Learning to Learn Using Gradient Descent

Hochreiter, Sepp; Younger, A. Steven; Conwell, Peter R.

doi:10.1007/3-540-44668-0_13

Sepp Hochreiter⁷,
A. Steven Younger⁷ &
Peter R. Conwell⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2130))

Included in the following conference series:

International Conference on Artificial Neural Networks

5332 Accesses
157 Citations
6 Altmetric

Abstract

This paper introduces the application of gradient descent methods to meta-learning. The concept of “meta-learning”, i.e. of a system that improves or discovers a learning algorithm, has been of interest in machine learning for decades because of its appealing applications. Previous meta-learning approaches have been based on evolutionary methods and, therefore, have been restricted to small models with few free parameters. We make meta-learning in large systems feasible by using recurrent neural networks with their attendant learning routines as meta-learning systems. Our system derived complex well performing learning algorithms from scratch. In this paper we also show that our approach performs non-stationary time series prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Caruana. Learning many related tasks at the same time with backpropagation. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, pages 657–664. The MIT Press, 1995.
Google Scholar
D. Chalmers. The evolution of learning: An experiment in genetic connectionism. In D. S. Touretzky, J. L. Elman, T. J. Sejnowski, and G. E. Hinton, editors, Proc. of the 1990 Con. Models Summer School, pages 81–90. Morgan Kaufmann, 1990.
Google Scholar
N. E. Cotter and P. R. Conwell. Fixed-weight networks can learn. In Int. Joint Conference on Neural Networks, volume II, pages 553–559. IEEE, NY, 1990.
Chapter Google Scholar
H. Ellis. Transfer of Learning. MacMillan, New York, NY, 1965.
Google Scholar
J. L. Elman. Finding structure in time. Technical Report CRL 8801, Center for Research in Language, University of California, San Diego, 1988.
Google Scholar
D. Haussler. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial Intelligence, 36:177–221, 1988.
Article MATH MathSciNet Google Scholar
S. Hochreiter and J. Schmidhuber. Flat minima. Neural Comp., 9(1):1–42, 1997.
Article MATH Google Scholar
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735–1780, 1997.
Article Google Scholar
A. J. Robinson and F. Fallside. The utility driven dynamic error propagation network. Technical Report CUED/F-INFENG/TR.1, Camb. Uni. Eng. Dep., 1987.
Google Scholar
T. P. Runarsson and M. T. Jonsson. Evolution and design of distributed learning rules. In 2000 IEEE Symposium of Combinations of Evolutionary Computing and Neural Networks, San Antonio, Texas, USA, page 59. 2000.
Google Scholar
J. Schmidhuber, J. Zhao, and M. Wiering. Simple principles of metalearning. Technical Report IDSIA-69-96, IDSIA, 1996.
Google Scholar
J. Schmidhuber, J. Zhao, and M. Wiering. Shifting inductive bias with success-story algorithm, adaptive levin search, and incremental self-improvement. Machine Learning, 28:105–130, 1997.
Article Google Scholar
J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Inst. für Inf., Tech. Univ. München, 1987.
Google Scholar
S. Thrun and L. Pratt, editors. Learning To Learn. Kluwer Academic Pub., 1997.
Google Scholar
P. Utgoff. Shift of bias for inductive concept learning. In R. Michalski, J. Carbonell, and T. Mitchell, editors, Machine Learning, volume 2. Morgan Kaufmann, 1986.
Google Scholar
P. J. Werbos. Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1, 1988.
Google Scholar
R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent networks. Technical Report ICS 8805, Univ. of Cal., La Jolla, 1988.
Google Scholar
R. J. Williams and D. Zipser. Gradient-based learning algorithms for recurrent networks and their computational complexity. In Y. Chauvin and D. E. Rumelhart, editors, Back-propagation: Theory, Architectures and Applications. Hillsdale, 1992.
Google Scholar
A. S. Younger, P. R. Conwell, and N. E. Cotter. Fixed-weight on-line learning. IEEE-Transactions on Neural Networks, 10(2):272–283, 1999.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Colorado, Boulder, CO, 80309-0430
Sepp Hochreiter & A. Steven Younger
Physics Department Westminster College, Salt Lake City, Utah
Peter R. Conwell

Authors

Sepp Hochreiter
View author publications
You can also search for this author in PubMed Google Scholar
A. Steven Younger
View author publications
You can also search for this author in PubMed Google Scholar
Peter R. Conwell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Mecidal Cybernetics and Artificial Intelligence, University of Vienna, Freyung 6/2, 1010, Vienna, Austria
Georg Dorffner
Institute for Computer Aided Automation Pattern Recognition and Image Processing Group, Technical University of Vienna, Favoritenstr. 9/1832, 1040, Vienna, Austria
Horst Bischof
Institut für Statistik, Wirtschaftsuniversität Wien, Augasse 2-6, 1090, Wien, Austria
Kurt Hornik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hochreiter, S., Younger, A.S., Conwell, P.R. (2001). Learning to Learn Using Gradient Descent. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_13

Download citation

DOI: https://doi.org/10.1007/3-540-44668-0_13
Published: 17 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42486-4
Online ISBN: 978-3-540-44668-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics