From Active to Proactive Learning Methods

Donmez, Pinar; Carbonell, Jaime G.

doi:10.1007/978-3-642-05177-7_5

Pinar Donmez⁵ &
Jaime G. Carbonell⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 262))

2273 Accesses
4 Citations

Abstract

In many machine learning tasks, unlabled data abounds, but expert-generated labels are scarce. Consider the process of learning to build a classier for the Sloan Digital Sky Survey (http://www.sdss.org/) so that each astronomical observation may be assigned its class (e.g. “pinwheel galaxy”, “globular galaxy”, “quasar”, “colliding galaxies”, “nebula”, etc.). The SDSS contains 230 million astronomical objects, among which professional Astronomers have classified manually less than one tenth of 1 percent. Consider classifying web pages into subject-matter based taxonomies, such as the Yahoo taxomy or a Dewy library catalog system. Whereas there are many billions of web pages, less than .001% have reliable topic or subject categories.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002)
Article MATH MathSciNet Google Scholar
Baram, Y., El-Yaniv, R., Luz, K.: Online choice of active learning algorithms. In: ICML 2003, pp. 19–26 (2003)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT 1998, pp. 92–100 (1998)
Google Scholar
Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics 2005 (2005)
Google Scholar
Cheng, B.Y., Carbonell, J.G., Klein-Seetharaman, J.: Protein classification based on text document classification techniques. Proteins: Structure, Function and Bioinformatics 58(4), 955–970 (2005)
Article Google Scholar
Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. In: Advances in Neural Information Processing Systems, vol. 7, pp. 705–712 (1995)
Google Scholar
DeGroot, M.H.: Optimal Statistical Decisions. Wiley Classics Library, Chichester (2004)
Book MATH Google Scholar
Donmez, P., Carbonell, J.G.: Paired sampling in density-sensitive active learning. In: International Symposium on Artificial Intelligence and Mathematics (2008)
Google Scholar
Donmez, P., Carbonell, J.G., Bennett, P.N.: Dual strategy active learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 116–127. Springer, Heidelberg (2007)
Chapter Google Scholar
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Applied Statistics, 28
Google Scholar
He, J., Carbonell, J., Liu, Y.: Graph-based semi-supervised learning as a generative model. In: IJCAI 2007 (2007)
Google Scholar
Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML 1999 (1999)
Google Scholar
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: SIGIR 1994, pp. 3–12 (1994)
Google Scholar
McCallum, A., Nigam, K.: Employing em and pool-based active learning for text classification. In: ICML 1998, pp. 359–367 (1998)
Google Scholar
Melville, P., Mooney, R.J.: Constructing diverse classifier ensembles using artificial training examples. In: IJCAI 2003, pp. 505–510 (2003)
Google Scholar
Michalski, R.S., Carbonell, J.G., Mitchell, T.M.(eds.): Machine Learning: An Artificial Intelligence Approach. TIOGA Publishing Co. (1983)
Google Scholar
Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.): Machine Learning: An Artificial Intelligence Approach, vol. II. Morgan Kaufmann Publishers, Inc., San Francisco (1986)
MATH Google Scholar
Miller, J.W., Goodman, R., Smyth, P.: On loss functions which minimize to conditional expected values and posterior probabilities. IEEE Transactions on Information Theory 39(4), 1404–1408 (1993)
Article MATH Google Scholar
Muslea, I., Minton, S., Knoblock, C.: Selective sampling with naive co-testing: preliminary results. In: The ECAI 2000 workshop on Machine Learning for information extraction (2000)
Google Scholar
Nguyen, H.T., Smeulders, A.: Active learning with pre-clustering. In: ICML 2004, pp. 623–630 (2004)
Google Scholar
Reinke, R., Michalski, R.S.: Incremental learning of decision rules: A method and experimental results. Presented at the Machine Intelligence Workshop II (1985)
Google Scholar
Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: ICML 2001, pp. 441–448 (2001)
Google Scholar
Shen, X., Zhai, C.: Active feedback in ad hoc information retrieval. In: SIGIR 2005, pp. 59–66 (2005)
Google Scholar
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: KDD 2008, pp. 614–622 (2008)
Google Scholar
Tang, M., Luo, X., Roukos, S.: Active learning for statistical natural language parsing. In: ACL 2002 (2002)
Google Scholar
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of International Conference on Machine Learning, pp. 999–1006 (2000)
Google Scholar
Wang, Y., Wu, L.-Y., Chen, L., Zhang, X.-S.: Supervised classification of protein structures based on convex hull representation. International Journal of Bioinformatics Research and Applications 3(2), 123–144 (2007)
Article Google Scholar
Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 393–407. Springer, Heidelberg (2003)
Chapter Google Scholar
Zhang, C., Chen, T.: An active learning framework for content-based information retrieval. IEEE Trans. on Multimedia 4, 260–268 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Language Technologies Institute, School of Computer Science, Carnegie Mellon University,
Pinar Donmez & Jaime G. Carbonell

Authors

Pinar Donmez
View author publications
You can also search for this author in PubMed Google Scholar
Jaime G. Carbonell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul.Ordona 21, 01-237, Warsaw, Poland
Jacek Koronacki & Sławomir T. Wierzchoń &
Woodward Hall 430C University of North Carolina, 9201 University City Blvd., N.C. 28223, Charlotte, USA
Zbigniew W. Raś
Systems Research Institute, Polish Academy of Sciences, ul.Newelska 6, 01-447, Warsaw, 01-447
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Donmez, P., Carbonell, J.G. (2010). From Active to Proactive Learning Methods. In: Koronacki, J., Raś, Z.W., Wierzchoń, S.T., Kacprzyk, J. (eds) Advances in Machine Learning I. Studies in Computational Intelligence, vol 262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05177-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-05177-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05176-0
Online ISBN: 978-3-642-05177-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics