Personalized Classification for Keyword-Based Category Profiles

Sun, Aixin; Lim, Ee-Peng; Ng, Wee-Keong

doi:10.1007/3-540-45747-X_5

Aixin Sun⁶,
Ee-Peng Lim⁶ &
Wee-Keong Ng⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2458))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1691 Accesses
2 Citations

Abstract

Personalized classification refers to allowing users to define their own categories and automating the assignment of documents to these categories. In this paper, we examine the use of keywords to define personalized categories and propose the use of Support Vector Machine (SVM) to perform personalized classification. Two scenarios have been investigated. The first assumes that the personalized categories are defined in a flat category space. The second assumes that each personalized category is defined within a pre-defined general category that provides a more specific context for the personalized category. The training documents for personalized categories are obtained from a training document pool using a search engine and a set of keywords. Our experiments have delivered better classification results using the second scenario. We also conclude that the number of keywords used can be very small and increasing them does not always lead to better classification performance.

The work is partially supported by the SingAREN 21 research grant M48020004.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

T. Ault and Y. Yang. kNN at TREC-9. In Proc. of the 9th Text REtrieval Conference (TREC-9), Gaithersburg, Maryland, 2000.
Google Scholar
S. T. Dumais and H. Chen. Hierarchical classification of Web content. In Proc. of the 23rd ACM Int. Conf. on Research and Development in Information Retrieval (SIGIR), pages 256–263, Athens, GR, 2000.
Google Scholar
S. T. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proc. of the 7th Int. Conf. on Information and Knowledge Management, pages 148–155, 1998.
Google Scholar
T. Joachims. SVM ^light, An implementation of Support Vector Machines (SVMs) in C. http://svmlight.joachims.org/.
T. Joachims. Text categorization with support vector machines: learning with many relevant features. In Proc. of the 10th European Conf. on Machine Learning, pages 137–142, Chemnitz, DE, 1998.
Google Scholar
D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In Proc. of the 14th Int. Conf. on Machine Learning, pages 170–178, Nashville, US, 1997.
Google Scholar
K.-S. Lee, J.-H. Oh, J. Huang, J.-H. Kim, and K.-S. Choi. TREC-9 experiments at KAIST: QA, CLIR and batch filtering. In Proc. of the 9th Text REtrieval Conference (TREC-9), Gaithersburg, Maryland, 2000.
Google Scholar
A. K. McCallum. BOW: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/mccallum/bow, 1996.
D. Mladenic. Feature subset selection in text-learning. In Proc. of the 10th European Conf. on Machine Learning, pages 95–100, 1998.
Google Scholar
D. W. Oard. The state of the art in text filtering. User Modeling and User-Adapted Interactions: An International Journal, 7(3):141–178, 1997.
Article Google Scholar
M. J. Pazzani and D. Billsus. Learning and revising user profiles: The identification of interesting web sites. Machine Learning, 27(3):313–331, 1997.
Article Google Scholar
S. Robertson and D. A. Hull. The TREC-9 filtering track final report. In Proc. of the 9th Text REtrieval Conference (TREC-9), Gaithersburg, Maryland, 2000.
Google Scholar
F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1–47, 2002.
Article Google Scholar
TREC. Text REtrieval Conference. http://trec.nist.gov/.
Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1–2):69–90, 1999.
Article Google Scholar
Y. Yang and X. Liu. A re-examination of text categorization methods. In Proc. of the 22nd ACM Int. Conf. on Research and Development in Information Retrieval, pages 42–49, Berkeley, USA, Aug 1999.
Google Scholar
Y. Zhang and J. Callan. YFilter at TREC-9. In Proc. of the 9th Text REtrieval Conference (TREC-9), Gaithersburg, Maryland, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Advanced Information Systems School of Computer Engineering, Nanyang Technological University, Singapore
Aixin Sun, Ee-Peng Lim & Wee-Keong Ng

Authors

Aixin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Ee-Peng Lim
View author publications
You can also search for this author in PubMed Google Scholar
Wee-Keong Ng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Via Gradenigo 6/a, 35131, Padova, Italy
Maristella Agosti
Istituto di Scienza e Tecnologie dell’ Informazione (ISTI-CNR), Area della Ricerca CNR di Pisa, Via G. Moruzzi 1, 56124, Pisa, Italy
Costantino Thanos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, A., Lim, EP., Ng, WK. (2002). Personalized Classification for Keyword-Based Category Profiles. In: Agosti, M., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2002. Lecture Notes in Computer Science, vol 2458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45747-X_5

Download citation

DOI: https://doi.org/10.1007/3-540-45747-X_5
Published: 13 September 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44178-6
Online ISBN: 978-3-540-45747-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics