Skip to main content

Refinement Method of Post-processing and Training for Improvement of Automated Text Classification

  • Conference paper
Computational Science and Its Applications - ICCSA 2006 (ICCSA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3981))

Included in the following conference series:

Abstract

The paper presents a method for improving text classification by using examples that are difficult to classify. Generally, researches to improve the text categorization performance are focused on enhancing existing classification models and algorithms itself, but the range of which has been limited by the feature-based statistical methodology. In this paper, we propose a new method to improve the accuracy and the performance using refinement training and post-processing. Especially, we focused on complex documents that are generally considered to be hard to classify. Our proposed method has a different style from traditional classification methods, and take a data mining strategy and fault tolerant system approaches. In experiments, we applied our system to documents which usually get low classification accuracy because they are laid on a decision boundary. The result shows that our system has high accuracy and stability in actual conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Bayardo, R., Srikant, R.: Athena: Mining-based Interactive Management of Text Databases. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 365–379. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval 1(1), 67–88 (1999)

    Google Scholar 

  3. Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning (1997)

    Google Scholar 

  4. Lewis, D.D., Catlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: Proceedings of the 11th international Conference on Machine Learning, pp. 148–156 (1994)

    Google Scholar 

  5. Zheng, Z.: Naïve Bayesian Classifier Committees. In: Proceedings of European Conference on Machine Learning, pp. 196–207 (1998)

    Google Scholar 

  6. Pedro, D., Michael, P.: Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. In: Proceedings of the 13th International Conference on Machine Learning, pp. 105–112 (1996)

    Google Scholar 

  7. Koller, D., Tong, S.: Active learning for parameter estimation in Bayesian networks. In: Neural Information Processing Systems (2001)

    Google Scholar 

  8. Liu, B., Wu, H., Phang, T.H.: A Refinement Approach to Handling Model Misfit in Text Categorization. In: SIGKDD (2002)

    Google Scholar 

  9. Castillo, M.D., Serrano, J.L.: A Multistrategy Approach for Digital Text Categorization form Imbalanced Documents. In: SIGKDD, vol. 6, pp. 70–79 (2004)

    Google Scholar 

  10. Gao, S., Wu, W., et al.: A MFoM Learning Approach to Robust Multiclass Multi-Label Text Categorization. In: Proceedings of the 21st Intenational Conference on Machine Learning (2004)

    Google Scholar 

  11. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  12. Hasenager, M.: Active Data Selection in Supervised and Unsupervised Learning. PhD thesis, Technische Fakultat der Universitat Bielefeld (2000)

    Google Scholar 

  13. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, p. 1. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  14. Newsgroup dataset: http://www.cs.cmu.edu/~textlearning/

  15. BOW toolkit: http://www.cs.cmu.edu/~mccallum/bow/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Choi, Y.J., Park, S.S. (2006). Refinement Method of Post-processing and Training for Improvement of Automated Text Classification. In: Gavrilova, M.L., et al. Computational Science and Its Applications - ICCSA 2006. ICCSA 2006. Lecture Notes in Computer Science, vol 3981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11751588_32

Download citation

  • DOI: https://doi.org/10.1007/11751588_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34072-0

  • Online ISBN: 978-3-540-34074-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics