Skip to main content

ACUT: An Associative Classifier Approach to Unknown Word POS Tagging

  • Conference paper
  • First Online:
Artificial Intelligence and Signal Processing (AISP 2013)

Abstract

The focus of this article is unknown word Part-of-Speech (POS) tagging. POS tagging which is one the fundamental requirements for intelligent text processing based on texts language. Therefore, this article firstly aims to provide a POS tagger with high accuracy for Persian language. The technique which is proposed by this article for handling unknown words is using a combination of a type of associative classifier along with a Hidden Markov Models (HMM) algorithm. Associative classification is a new classification approach integrating association mining and classification. The associative classifier used in this study is a type of associative classifiers that is innovated by this research. This kind of classifier not only uses sequence probability but also uses the CBA classifier. CBA first generates all the association rules with certain support and confidence thresholds as candidate rules. It then selects a small set of rules from them to form a classifier. When predicting the class label for an example, the best rule whose body is satisfied by the example is chosen for prediction. Based on the experimental results, the proposed algorithm can increase the accuracy of Persian unknown word POS tagging to 81.8 %. The total accuracy of proposed tagger is 98 % and its sentence accuracy is 63.1 %.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Attia, M., Foster, J., Hogan, D., Roux, J.L., Tounsi, L., van Genabith, J.: Handling unknown words in statistical latent-variable parsing models for Arabic, English and French. In: SPMRL 2010, pp. 67–75 (2010)

    Google Scholar 

  2. Bijankhan, M., Sheykhzadegan, J., Bahrani, M., Ghayoomi, M.: Lessons from building a Persian written corpus: Peykare. Lang. Resour. Eval. 45(2), 143–164 (2011)

    Article  Google Scholar 

  3. Brants, T.: TnT: a statistical part of speech tagger. In: Proceedings of the 6th Conference on Applied Natural Language Processing, 29 April–04 May, Association for Computational Linguistics Morris-town, USA (2000)

    Google Scholar 

  4. Behmanesh, A.A., Pilevar, A.H.: Statistical part of speech tagger for Persian words. In: JeTou 2011 (2011)

    Google Scholar 

  5. Elahimanesh, M.H., Minaei-Bidgoli, B.: Making part of speech taggers robust to unknown words, pp. 45–47. M.Sc. thesis, Islamic Azad University, Qazvin branch (2012, in Persian)

    Google Scholar 

  6. Erbach, G.: Syntactic processing of unknown words. IWBS report 131, IBM, Stuttgart (1990)

    Google Scholar 

  7. Erk, K.: Unknown word sense detection as outlier detection. In: Proceedings of NAACL 2006, New York, NY (2006)

    Google Scholar 

  8. Fu, G., Luke, K.-K.: Chinese unknown word identification using class-based LM. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 704–713. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Fadaei, H., Shamsfard M..: Persian POS tagging using probabilistic morphological analysis. Int. J. Comput. Appl. Technol. 264–273 (2010)

    Google Scholar 

  10. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: KDD’98, New York, NY, August 1998

    Google Scholar 

  11. Li, W., Han, J., Pei, J.: CMAR: accurate and efficient classification based on multiple class-association rules. In: proceedings of ICDM, pp. 369–376 (2001)

    Google Scholar 

  12. Manning, C.D.: Part-of-Speech tagging from 97% to 100%: is it time for some linguistics? In: Gelbukh, A.F. (ed.) CICLing 2011, Part I. LNCS, vol. 6608, pp. 171–189. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Mohseni, M., Minaei-Bidgoli, B.: A system for Persian text corpora POS Tagging and disambiguation. B.E. dissertation, 78 pp. Iran University of Science and Technology, Tehran (2008, in Persian)

    Google Scholar 

  14. Okhovvat, M., Minaei-Bidgoli, B.: A hidden Markov model for Persian part-of-speech tagging. In: Proceedings of Procedia CS, pp. 977–981 (2011)

    Google Scholar 

  15. Raja, F., Tasharofi, S., Oroumchian F.: Statistical POS tagging experiments on Persian text. In: Second Workshop on Computational Approaches to Arabic Script-Based Languages, 21–22 July 2007, Stanford, California (2007)

    Google Scholar 

  16. Samuelsson, C.: Morphological tagging based entirely on Bayesian inference. In: 9th Nordic Conference on Computational Linguistic NODALIDA-93, Stockholm University, Stockholm, Sweden (1993)

    Google Scholar 

  17. Seraji, M.: A statistical part-of-speech tagger for Persian. In: Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. NEALT Proceedings Series, pp. 340–343 (2011)

    Google Scholar 

  18. Taylor, J.M., Raskin, V., Hempelmann, C.F.: Towards computational guessing of unknown word meanings: the ontological se-mantic approach. In: Cognitive Science Conference, Boston, MA (2011)

    Google Scholar 

  19. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Boston (2005)

    Google Scholar 

  20. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)

    Google Scholar 

  21. Umansky-Pesin, S., Reichart, R., Rappoport, A.: A multi-domain web-based algorithm for POS tagging of unknown words. In: Coling 2010, pp. 1274–1282 (2010)

    Google Scholar 

  22. Yin, X., Han, J.: CPAR: classification based on Predictive Association Rules. In: proceedings of SIAM International Conference on Data Mining, San Fransisco, CA, pp. 331–335 (2003)

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank Noor Text Mining Research group of Computer Research Center of Islamic Sciences (www.noorsoft.org) for supporting this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Hossein Elahimanesh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Elahimanesh, M.H., Minaei-Bidgoli, B., Kermani, F. (2014). ACUT: An Associative Classifier Approach to Unknown Word POS Tagging. In: Movaghar, A., Jamzad, M., Asadi, H. (eds) Artificial Intelligence and Signal Processing. AISP 2013. Communications in Computer and Information Science, vol 427. Springer, Cham. https://doi.org/10.1007/978-3-319-10849-0_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10849-0_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10848-3

  • Online ISBN: 978-3-319-10849-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics