Skip to main content

Improving Arabic Part-of-Speech Tagging through Morphological Analysis

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6591))

Included in the following conference series:

Abstract

This paper describes our newly-developed second order hidden Markov model part-of-speech tagging system specially designed to tag Arabic texts using small training data. The tagger achieves encouraging results. In addition, the paper also presents a hybrid tagging architecture for Arabic, in which our tagger augmented with a weighted morphological analyzer. Finally, we compare the tagger results - both standalone and utilizing a highly coverage morphological analyzer. Experimental results are presented and discussed using small training corpus. The experiments show that the best proposed hybrid architecture significantly improves unknown words POS tagging accuracy. 96.6% precision rates are obtained when unknown words occur in the test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nakagawa, T.: Multilingual word segmentation and part-of-speech tagging: a machine learning approach incorporating diverse features. PhD Thesis, Nara Institute of Science and Technology, Japan (2006)

    Google Scholar 

  2. Fischl, W.: Part of Speech Tagging - A solved problem? Unpublished report, Center for Integrative Bioinformatics Vienna, CIBIV (2009)

    Google Scholar 

  3. Marques, N.C., Pereira Lopes, J.G.: Tagging with Small Training Corpora. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, pp. 63–72. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  4. Giesbrecht, E., Stefan, E.: Is Part-of-Speech Tagging a Solved Task? An Evaluation of POS Taggers for the German Web as Corpus. In: Proceedings of the 5th Web as Corpus Workshop (WAC5), Donostia (2009)

    Google Scholar 

  5. Albared, M., Omar, N., Ab Aziz, M.J.: Automatic Part of Speech Tagging for Arabic: An Experiment Using Bigram Hidden Markov Model. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010. LNCS, vol. 6401, pp. 361–370. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Karlsson, F., Voutilainen, A., Heikkila, J., Anttila, A.: Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin (2010)

    Google Scholar 

  7. Samuelsson, C., Voutilainen, A.: Comparing a linguistic and a stochastic tagger. In: Proceedings of the eighth conference on European Chapter of the Association for Computational Linguistics (EACL), Madrid, Spain, pp. 246–253 (1997)

    Google Scholar 

  8. Gimenez, J., Marquez, L.: SVM tool: A general POS tagger generator based on support vector machines. In: Proceedings of the Fourth Conference on Language Resources and Evaluation, Lisbon, Portugal (2004)

    Google Scholar 

  9. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning, MA, USA (2001)

    Google Scholar 

  10. Brants, T.: TnT: A statistical part-of-speech tagger. In: Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, WA, USA (2000)

    Google Scholar 

  11. Thede, S., Harper, M.: A second-order Hidden Markov Model for part-of-speech tagging. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (1999)

    Google Scholar 

  12. Emad, M., Sandra, K.: Arabic part of speech tagging. In: Proceedings of LREC, Valetta, Malta (2010)

    Google Scholar 

  13. Al Shamsi, F., Guessoum, A.: A hidden Markov model-based POS tagger for Arabic. In: Proceeding of the 8th International Conference on the Statistical Analysis of Textual Data, France, pp. 31–42 (2006)

    Google Scholar 

  14. El Hadj, Y., Al-Sughayeir, I., Al-Ansari, A.: Arabic Part-Of-Speech Tagging using the Sentence Structure. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt (2009)

    Google Scholar 

  15. Albared, M., Omar, N., Ab Aziz, M.J.: Arabic Part of Speech Disambiguation. International Review on Computers and Software 4(5), 517–532 (2009)

    Google Scholar 

  16. Halacsy, P., Kornai, A., Oravecz, C.: HunPos - an open source trigram tagger. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume. Proceedings of the Demo and Poster Sessions, Prague, Czech Republic, pp. 209–212 (2007)

    Google Scholar 

  17. Schroeder, I.: A case study in part-of-speech tagging using the ICOPOST toolkit. Technical report, Department of Computer Science, University of Hamburg (2002)

    Google Scholar 

  18. Farghaly, A., Shaalan., K.: Arabic Natural Language Processing: Challenges and Solutions, vol. 8(4) (2009), doi:10.1145/1644879.1644881

    Google Scholar 

  19. Agi, Ž., Tadi, M., Dovedan, Z.: Improving Part-of-Speech Tagging Accuracy for Croatian by Morphological Analysis. Informatica 32(4), 445–451 (2008)

    Google Scholar 

  20. Buckwalter, T.: Buckwalter Arabic morphological analyzer version 2.0 (2004)

    Google Scholar 

  21. AlGahtani, S., Black, W., McNaught, J.: Arabic Part-Of-Speech Tagging using Transformation-Based Learning. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Albared, M., Omar, N., Ab Aziz, M.J. (2011). Improving Arabic Part-of-Speech Tagging through Morphological Analysis. In: Nguyen, N.T., Kim, CG., Janiak, A. (eds) Intelligent Information and Database Systems. ACIIDS 2011. Lecture Notes in Computer Science(), vol 6591. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20039-7_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20039-7_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20038-0

  • Online ISBN: 978-3-642-20039-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics