Skip to main content

Multitask Learning

  • Chapter
Learning to Learn

Abstract

Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. This paper reviews prior work on MTL, presents new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals, and presents new results for MTL with k-nearest neighbor and kernel regression. In this paper we demonstrate multitask learning in three domains. We explain how multitask learning works, and show that there are many opportunities for multitask learning in real domains. We present an algorithm and results for multitask learning with case-based methods like k-nearest neighbor and kernel regression, and sketch an algorithm for multitask learning in decision trees. Because multitask learning works, can be applied to many different kinds of domains, and can be used with different learning algorithms, we conjecture there will be many opportunities for its use on real-world problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abu-Mostafa, Y. S., “Learning from Hints in Neural Networks,” Journal of Complexity, 1990, 6(2), pp. 192–198.

    Article  MathSciNet  MATH  Google Scholar 

  • Abu-Mostafa, Y. S., “Hints and the VC Dimension,” Neural Computation, 1993, 5(2

    Google Scholar 

  • Abu-Mostafa, Y. S., “Hints,” Neural Computation, 1995, 7, pp. 639–671.

    Article  Google Scholar 

  • Baluja, S. and Pomerleau, D. A., “Using the Representation in a Neural Network’s Hidden Layer for Task-Specific Focus of Attention,” Proceedings of the International Joint Conference on Artificial Intelligence 1995, IJCAI-95, Montreal, Canada, 1995, pp. 133–139.

    Google Scholar 

  • Baxter, J., “Learning Internal Representations,” Ph.D. Thesis, The Flinders Univeristy of South Australia, Dec. 1994.

    Google Scholar 

  • Baxter, J., “Learning Internal Representations,” Proceedings of the 8th ACM Conference on Computational Learning Theory, (COLT-95), Santa Cruz, CA, 1995.

    Google Scholar 

  • Baxter, J., “A Bayesian/Information Theoretic Model of Bias Learning,”, Proceedings of the 9th International Conference on Computational Learning Theory, (COLT-96), Desenzano del Gardo, Italy, 1996.

    Google Scholar 

  • Breiman, L. and Friedman, J. H., “Predicting Multivariate Responses in Multiple Linear Regression,” 1995, http://www.ftp.stat.berkeley.edu

  • Caruana, R., “Multitask Learning: A Knowledge-Based Source of Inductive Bias,” Proceedings of the 10th International Conference on Machine Learning, ML-93, University of Massachusetts, Amherst, 1993, pp. 41–48.

    Google Scholar 

  • Caruana, R., “Multitask Connectionist Learning,” Proceedings of the 1993 Connectionist Models Summer School, 1994, pp. 372–379.

    Google Scholar 

  • Caruana, R., “Learning Many Related Tasks at the Same Time with Backpropagation,” Advances in Neural Information Processing Systems 7, (Proceedings of NIPS-94), 1995, pp. 656–664.

    Google Scholar 

  • Caruana, R., Baluja, S., and Mitchell, T, “Using the Future to “Sort Out” the Present: Rankprop and Multitask Learning for Medical Risk Prediction,” Advances in Neural Information Processing Systems 8, (Proceedings of NIPS-95), 1996, pp. 959–965.

    Google Scholar 

  • Caruana, R. and de Sa, V. R., “Promoting Poor Features to Supervisors: Some Inputs Work Better As Outputs,” to appear in Advances in Neural Information Processing Systems 9, (Proceedings of NIPS-96), 1997.

    Google Scholar 

  • Caruana, R., “Multitask Learning,” Ph.D. Thesis, School of Computer Science, Carnegie Mellon University, 1997.

    Google Scholar 

  • Cooper, G. F. and Herskovits, E., “A Bayesian Method for the Induction of Probabilistic Networks from Data,” Machine Learning, 1992, 9, pp. 309–347.

    MATH  Google Scholar 

  • Cooper, G. F, Aliferis, C. F, Ambrosino, R., Aronis, J., Buchanan, B. G., Caruana, R., Fine, M. J., Glymour, C, Gordon, G., Hanusa, B. H., Janosky, J. E., Meek, C, Mitchell, T, Richardson, T, and Spirtes, P., “An Evaluation of Machine Learning Methods for Predicting Pneumonia Mortality,” Artificial Intelligence in Medicine 9, 1997, pp. 107–138.

    Google Scholar 

  • Craven, M. and Shavlik, J., “Using Sampling and Queries to Extract Rules from Trained Neural Networks,” Proceedings of the 11th International Conference on Machine Learning, ML-94, Rutgers University, New Jersey, 1994, pp. 37–45.

    Google Scholar 

  • Davis, I. and Stentz, A., “Sensor Fusion for Autonomous Outdoor Navigation Using Neural Networks,” Proceedings of IEEE’s Intelligent Robots and Systems Conference, 1995.

    Google Scholar 

  • Dent, L., Boticario, J., McDermott, J., Mitchell, T., and Zabowski, D., “A Personal Learning Apprentice,” Proceedings of 1992 National Conference on Artificial Intelligence, 1992.

    Google Scholar 

  • de Sa, V. R., “Learning Classification with Unlabelled Data,” Advances in Neural Information Processing Systems 6, (Proceedings of NIPS-93), 1994, pp. 112–119.

    Google Scholar 

  • Dietterich, T. G., Hild, H., and Bakiri, G., “A Comparative Study of ID3 and Backpropagation for English Text-to-speech Mapping,” Proceedings of the Seventh International Conference on Artificial Intelligence, 1990, pp. 24–31.

    Google Scholar 

  • Dietterich, T. G., Hild, H., and Bakiri, G., “A Comparison of ID3 and Backpropagation for English Text-to-speech Mapping,” Machine Learning, 18(1), 1995, pp. 51–80.

    Google Scholar 

  • Dietterich, T. G. and Bakiri, G., “Solving Multiclass Learning Problems via Error-Correcting Output Codes,” Journal of Artificial Intelligence Research, 1995, 2, pp. 263–286.

    MATH  Google Scholar 

  • Fine, M. J., Singer, D., Hanusa, B. H., Lave, J., and Kapoor, W., “Validation of a Pneumonia Prognostic Index Using the MedisGroups Comparative Hospital Database,” American Journal of Medicine, 1993.

    Google Scholar 

  • Fisher, D. H., “Conceptual Clustering, Learning from Examples, and Inference,” Proceedings of the 4th International Workshop on Machine Learning, 1987.

    Google Scholar 

  • Ghahramani, Z. and Jordan, M. I., “Supervised Learning from Incomplete Data Using an EM Approach,” Advances in Neural Information Processing Systems 6, (Proceedings of NIPS-93,) 1994, pp. 120–127.

    Google Scholar 

  • Ghahramani, Z. and Jordan, M. I., “Mixture Models for Learning from Incomplete Data,” Computational Learning Theory and Natural Learning Systems, Vol. IV, R. Greiner, T. Petsche and S.J. Hanson (eds.), Cambridge, MA, MIT Press, 1997, pp. 67–85.

    Google Scholar 

  • Ghosn, J. and Bengio, Y., “Multi-Task Learning for Stock Selection,” to appear in Advances in Neural Information Processing Systems 9, (Proceedings of NIPS-96), 1997.

    Google Scholar 

  • Hinton, G. E., “Learning Distributed Representations of Concepts,” Proceedings of the 8th International Conference of the Cognitive Science Society, 1986, pp. 1–12.

    Google Scholar 

  • Holmstrom, L. and Koistinen, P., “Using Additive Noise in Back-propagation Training,” IEEE Transactions on Neural Networks, 1992, 3(1), pp. 24–38.

    Article  Google Scholar 

  • Jordan, M. and Jacobs, R., “Hierarchical Mixtures of Experts and the EM Algorithm,” Neural Computation, 1994, 6, pp. 181–214.

    Article  Google Scholar 

  • Le Cun, Y, Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackal, L. D., “Backpropagation Applied to Handwritten Zip-Code Recognition,” Neural Computation, 1989, 1, pp. 541–551.

    Article  Google Scholar 

  • Little, R. J. A. and Rubin, D. B., Statistical Analysis with Missing Data, 1987, Wiley, New York.

    MATH  Google Scholar 

  • Liu, H. and Setiono, R., “A Probibilistic Approach to Feature Selection—A Filter Solution,” Proceedings of the 13th International Conference on Machine Learning, ICML-96, Bari, Italy, 1996, pp. 319–327.

    Google Scholar 

  • Martin, J. D., “Goal-directed Clustering,” Proceedings of the 1994 AAAI Spring Symposium on Goal-directed Learning, 1994.

    Google Scholar 

  • Martin, J. D. and Billman, D. O., “Acquiring and Combining Overlapping Concepts,” Machine Learning, 1994, 16, pp. 1–37.

    Google Scholar 

  • Mitchell, T., “The Need for Biases in Learning Generalizations,” Rutgers University: CBM-TR-117, 1980.

    Google Scholar 

  • Mitchell, T., Caruana, R., Freitag, D., McDermott, J., and Zabowski, D., “Experience with a Learning Personal Assistant,” Communications of the ACM: Special Issue on Agents, July 1994, 37(7), pp. 80–91.

    Google Scholar 

  • Munro, P. W. and Parmanto, B., “Competition Among Networks Improves Committee Performance,” to appear in Advances in Neural Information Processing Systems 9, (Proceedings of NIPS-96), 1997.

    Google Scholar 

  • Omohundro, S. M., “Family Discovery,” Advances in Neural Information Processing Systems 8, (Proceedings of NIPS-95), 1996, pp. 402–408.

    Google Scholar 

  • O’Sullivan, J. and Thrun, S., “Discovering Structure in Multiple Learning Tasks: The TC Algorithm,” Proceedings of the 13th International Conference on Machine Learning, ICML-96, Bari, Italy, 1996, pp. 489–497.

    Google Scholar 

  • Pomerleau, D. A., “Neural Network Perception for Mobile Robot Guidance,” Carnegie Mellon University: CMU-CS-92-775, 1992.

    Google Scholar 

  • Pratt, L. Y., Mostow, J., and Kamm, C. A., “Direct Transfer of Learned Information Among Neural Networks,” Proceedings of AAAI-91, 1991.

    Google Scholar 

  • Pratt, L. Y, “Non-literal Transfer Among Neural Network Learners,” Colorado School of Mines: MCS-92-04, 1992.

    Google Scholar 

  • Quinlan, J. R., “Induction of Decision Trees,” Machine Learning, 1986, 1, pp. 81–106.

    Google Scholar 

  • Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufman Publishers, 1992.

    Google Scholar 

  • Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning Representations by Back-propagating Errors,” Nature, 1986, 323, pp. 533–536.

    Article  Google Scholar 

  • Sejnowski, T. J. and Rosenberg, C. R., “NETtalk: A Parallel Network that Learns to Read Aloud,” John Hopkins: JHU/EECS-86/01, 1986.

    Google Scholar 

  • Sharkey, N. E. and Sharkey, A. J. C, “Adaptive Generalisation and the Transfer of Knowledge,” University of Exeter: R257, 1992.

    Google Scholar 

  • Sill, J. and Abu-Mostafa, Y, “Monotonicity Hints,” to appear in Neural Information Processing Systems 9, (Proceedings of NIPS-96), 1997.

    Google Scholar 

  • Spirtes, P., Glymour, C, and Scheines, R., Causation, Prediction, and Search, 1993, Springer-Verlag, New York.

    Book  MATH  Google Scholar 

  • Simard, P., Victorri, B., LeCun, Y, and Denker, J., “Tangent Prop—A Formalism for Specifying Selected Invariances in an Adaptive Neural Network,” Advances in Neural Information Processing Systems 4, (Proceedings of NIPS-91) 1992, pp. 895–903.

    Google Scholar 

  • Suddarth, S. C. and Holden, A.D.C., “Symbolic-neural Systems and the Use of Hints for Developing Complex Systems,” International Journal of Man-Machine Studies, 1991, 35(3), pp. 291–311.

    Article  Google Scholar 

  • Suddarth, S. C. and Kergosien, Y. L., “Rule-injection Hints as a Means of Improving Network Performance and Learning Time,” Proceedings of the 1990 EURASIP Workshop on Neural Networks, 1990, pp. 120–129.

    Google Scholar 

  • Thrun, S. and Mitchell, T., “Learning One More Thing,” Carnegie Mellon University: CS-94-184, 1994.

    Google Scholar 

  • Thrun, S., “Lifelong Learning: A Case Study,” Carnegie Mellon University: CS-95-208, 1995.

    Google Scholar 

  • Thrun, S., “Is Learning the N-th Thing Any Easier Than Learning The First?,” Advances in Neural Information Processing Systems 8, (Proceedings of NIPS-95), 1996, pp. 640–646.

    Google Scholar 

  • Thrun, S., Explanation-Based Neural Network Learning: A Lifelong Learning Approach, 1996, Kluwer Academic Publisher.

    Google Scholar 

  • Tresp, V., Ahmad, S., and Neuneier, R., “Training Neural Networks with Deficient Data,” Advances in Neural Information Processing Systems 6, (Proceedings of NIPS-93), 1994, pp. 128–135.

    Google Scholar 

  • Valdes-Perez, R., and Simon, H., “A Powerful Heuristic for the Discovery of Complex Patterned Behavior,” Proceedings of the 11th International Conference on Machine Learning, ML-94, Rutgers University, New Jersey, 1994, pp. 326–334.

    Google Scholar 

  • Waibel, A., Sawai, H., and Shikano, K., “Modularity and Scaling in Large Phonemic Neural Networks” IEEE Transactions on Acoustics, Speech and Signal Processing, 1989, 37(12), pp. 1888–1898.

    Article  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Science+Business Media New York

About this chapter

Cite this chapter

Caruana, R. (1998). Multitask Learning. In: Thrun, S., Pratt, L. (eds) Learning to Learn. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5529-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-5529-2_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-7527-2

  • Online ISBN: 978-1-4615-5529-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics