Abstract
Effective incorporation of human expertise, while exerting a low cognitive load, is a critical aspect of real-life text classification applications that is not adequately addressed by batch-supervised high-accuracy learners. Standard text classifiers are supervised in only one way: assigning labels to whole documents. They are thus deprived of the enormous wisdom that humans carry about the significance of words and phrases in context. We present HIClass, an interactive and exploratory labeling package that actively collects user opinion on feature representations and choices, as well as whole-document labels, while minimizing redundancy in the input sought. Preliminary experience suggests that, starting with essentially an unlabeled corpus, very little cognitive labor suffices to set up a labeled collection on which standard classifiers perform well.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)
Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI 1999 Workshop on Information Filtering (1999)
Zhang, J., Yang, Y.: Robustness of regularized linear classification methods in text categorization. In: SIGIR (2003)
Third workshop on Operational Text Classification OTC 2003. In conjunction with SIGKDD 2003 (2003)
Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. In: Advances in Neural Information Processing Systems (1995)
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45–66 (2001)
Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28(2-3), 133–168 (1997)
Hsu, C., Lin, C.: A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks 13, 415–425 (2002)
Harpale, A.: Practical alternatives for active learning with applications to text classification. Master’s Thesis, IIT Bombay (2004)
McCallum, K., Nigam, K.: Employing EM in pool-based active learning for text classification. In: Proceedings of ICML 1998 (1998)
Sarawagi, S., Chakrabarti, S., Godbole, S.: Cross-training: Learning probabilistic mappings between topics. In: SIGKDD (2003)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Lizotte, D., Madani, O., Greiner, R.: Budgeted learning of naive-bayes classifiers. In: UAI (2003)
Avancini, H., Lavelli, A., Magnini, B., Sebastiani, F., Zanoli, R.: Expanding domain-specific lexicons by term categorization. In: Matsui, M., Zuccherato, R.J. (eds.) SAC 2003. LNCS, vol. 3006, Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Godbole, S., Harpale, A., Sarawagi, S., Chakrabarti, S. (2004). Document Classification Through Interactive Supervision of Document and Term Labels. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive