Skip to main content

From Active to Proactive Learning Methods

  • Chapter
Advances in Machine Learning I

Part of the book series: Studies in Computational Intelligence ((SCI,volume 262))

Abstract

In many machine learning tasks, unlabled data abounds, but expert-generated labels are scarce. Consider the process of learning to build a classier for the Sloan Digital Sky Survey (http://www.sdss.org/) so that each astronomical observation may be assigned its class (e.g. “pinwheel galaxy”, “globular galaxy”, “quasar”, “colliding galaxies”, “nebula”, etc.). The SDSS contains 230 million astronomical objects, among which professional Astronomers have classified manually less than one tenth of 1 percent. Consider classifying web pages into subject-matter based taxonomies, such as the Yahoo taxomy or a Dewy library catalog system. Whereas there are many billions of web pages, less than .001% have reliable topic or subject categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  2. Baram, Y., El-Yaniv, R., Luz, K.: Online choice of active learning algorithms. In: ICML 2003, pp. 19–26 (2003)

    Google Scholar 

  3. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT 1998, pp. 92–100 (1998)

    Google Scholar 

  4. Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics 2005 (2005)

    Google Scholar 

  5. Cheng, B.Y., Carbonell, J.G., Klein-Seetharaman, J.: Protein classification based on text document classification techniques. Proteins: Structure, Function and Bioinformatics 58(4), 955–970 (2005)

    Article  Google Scholar 

  6. Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. In: Advances in Neural Information Processing Systems, vol. 7, pp. 705–712 (1995)

    Google Scholar 

  7. DeGroot, M.H.: Optimal Statistical Decisions. Wiley Classics Library, Chichester (2004)

    Book  MATH  Google Scholar 

  8. Donmez, P., Carbonell, J.G.: Paired sampling in density-sensitive active learning. In: International Symposium on Artificial Intelligence and Mathematics (2008)

    Google Scholar 

  9. Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 116–127. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  10. Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Applied Statistics, 28

    Google Scholar 

  11. He, J., Carbonell, J., Liu, Y.: Graph-based semi-supervised learning as a generative model. In: IJCAI 2007 (2007)

    Google Scholar 

  12. Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML 1999 (1999)

    Google Scholar 

  13. Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: SIGIR 1994, pp. 3–12 (1994)

    Google Scholar 

  14. McCallum, A., Nigam, K.: Employing em and pool-based active learning for text classification. In: ICML 1998, pp. 359–367 (1998)

    Google Scholar 

  15. Melville, P., Mooney, R.J.: Constructing diverse classifier ensembles using artificial training examples. In: IJCAI 2003, pp. 505–510 (2003)

    Google Scholar 

  16. Michalski, R.S., Carbonell, J.G., Mitchell, T.M.(eds.): Machine Learning: An Artificial Intelligence Approach. TIOGA Publishing Co. (1983)

    Google Scholar 

  17. Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.): Machine Learning: An Artificial Intelligence Approach, vol. II. Morgan Kaufmann Publishers, Inc., San Francisco (1986)

    MATH  Google Scholar 

  18. Miller, J.W., Goodman, R., Smyth, P.: On loss functions which minimize to conditional expected values and posterior probabilities. IEEE Transactions on Information Theory 39(4), 1404–1408 (1993)

    Article  MATH  Google Scholar 

  19. Muslea, I., Minton, S., Knoblock, C.: Selective sampling with naive co-testing: preliminary results. In: The ECAI 2000 workshop on Machine Learning for information extraction (2000)

    Google Scholar 

  20. Nguyen, H.T., Smeulders, A.: Active learning with pre-clustering. In: ICML 2004, pp. 623–630 (2004)

    Google Scholar 

  21. Reinke, R., Michalski, R.S.: Incremental learning of decision rules: A method and experimental results. Presented at the Machine Intelligence Workshop II (1985)

    Google Scholar 

  22. Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: ICML 2001, pp. 441–448 (2001)

    Google Scholar 

  23. Shen, X., Zhai, C.: Active feedback in ad hoc information retrieval. In: SIGIR 2005, pp. 59–66 (2005)

    Google Scholar 

  24. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: KDD 2008, pp. 614–622 (2008)

    Google Scholar 

  25. Tang, M., Luo, X., Roukos, S.: Active learning for statistical natural language parsing. In: ACL 2002 (2002)

    Google Scholar 

  26. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of International Conference on Machine Learning, pp. 999–1006 (2000)

    Google Scholar 

  27. Wang, Y., Wu, L.-Y., Chen, L., Zhang, X.-S.: Supervised classification of protein structures based on convex hull representation. International Journal of Bioinformatics Research and Applications 3(2), 123–144 (2007)

    Article  Google Scholar 

  28. Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 393–407. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  29. Zhang, C., Chen, T.: An active learning framework for content-based information retrieval. IEEE Trans. on Multimedia 4, 260–268 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Donmez, P., Carbonell, J.G. (2010). From Active to Proactive Learning Methods. In: Koronacki, J., Raś, Z.W., Wierzchoń, S.T., Kacprzyk, J. (eds) Advances in Machine Learning I. Studies in Computational Intelligence, vol 262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05177-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05177-7_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05176-0

  • Online ISBN: 978-3-642-05177-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics