Abstract
Improving the accuracy of assigning new email messages to small folders can reduce the likelihood of users creating duplicate folders for some topics. In this paper we presented a hybrid classification model, PERC, and use the Enron Email Corpus to investigate the performance of kNN, SVM and PERC in a simulation of a real-time situation. Our results show that PERC is significantly better at assigning messages to small folders. The effects of different parameter settings for the classifiers are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bekkerman, R., McCallum, A., Huang, G.: Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora. CIIR Technical Report IR- 418(2004), Available at: http://www.cs.umass.edu/~ronb/papers/email.pdf
Guo, G., Wang, H., Bell, D., Bi, Y., Greer., K.: KNN Model-Based Approach in Classification. In: ODBASE (2003)
Han, E., Karypis, G.: Centroid-Based Document Classification: Analysis and Experimental Results. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 424–431 (2000)
Ke, S., Bowerman, C., Oakes, M.: Mining Personal Data Collections to Discover Categories and Category Labels. In: International Workshop of Text Mining Research, Practice and Opportunities, RANLP, pp. 17-22 (2005)
Kiritchenko, S., Matwin, S.: Email Classification with Co-Training. In: CASCON (2001)
Klimt, B., Yang, Y.: The Enron Corpus: A New Dataset for Email Classification Research. In: ECML (2004)
Lam, W., Ho, C.: Using a Generalized Instance Set for Automatic Text Categorization. In: SIGIR, pp. 81-89 (1998)
Yang, Y.: A Study on Thresholding Strategies for Text Classification. In: SIGIR, pp. 137-145 (2001)
Zhang, J., Yang, Y.: Robustness of Regularized Linear Classification Methods in Text Classification. In: SIGIR, pp. 190-197 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ke, SW., Bowerman, C., Oakes, M. (2006). PERC: A Personal Email Classifier. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11735106_41
Download citation
DOI: https://doi.org/10.1007/11735106_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33347-0
Online ISBN: 978-3-540-33348-7
eBook Packages: Computer ScienceComputer Science (R0)