Skip to main content

Analyzing Tagging Accuracy of Part-of-Speech Taggers

  • Conference paper
  • First Online:
Genetic and Evolutionary Computing (GEC 2015)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 388))

Included in the following conference series:

  • International Conference on Genetic and Evolutionary Computing

Abstract

Automated part-of-speech (POS) tagging has been a very active research area for many years and is the foundation of natural language processing systems. Natural Language Toolkit (NLTK) library in the Python environment provides the necessary tools for tagging, but doesn’t actually tell us what methods work the best. Therefore, this work analyzes the performance of part-of-speech taggers, namely the NLTK Default tagger, Regex tagger and N-gram taggers (Unigram, Bigram and Trigram) on a particular corpus. The corpora we have used for the analysis are; Brown, Penn Treebank and CoNLL2000. We have applied all taggers to these three corpora, resultantly we have shown that whereas Unigram tagger does the best tagging in all corpora, the combination of taggers does better if it is correctly ordered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. OReilly Media, USA (2009)

    MATH  Google Scholar 

  2. Boehm, I.: Unigram Backoff vs. TnT Evaluating Part of Speech Taggers, Introduction to Computational Linguistics, Austria

    Google Scholar 

  3. Smedt, T.D., Marfia, F., Matteucci, M., Daelemans, W.: Using Wiktionary to Build an Italian, CLiPS Computational Linguistics Research Group. University of Antwerp

    Google Scholar 

  4. Sheikh, Z.M.A.W.: A Trigram Part-of-Speech Tagger for the Apertium Free/Open Source Machine Translation Platform, Computer Science and Engineering. National Institute of Technology Allahabad-211004, India

    Google Scholar 

  5. Hagerman, C.: Evaluating the Performance of Automated Part-of-Speech Taggers on an L2 Corpus. Osaka Jogakuin College

    Google Scholar 

  6. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 313–330 (1993)

    Google Scholar 

  7. Part-Of-Speech tagging with NLTK. https://streamhacker.wordpress.com/tag/tagging/

  8. NLTK 3.0 Documentation. http://www.nltk.org/

  9. Brown Corpus Manual. http://icame.uib.no/brown/bcm.html

  10. NLTK Default Tagger Performance on CoNLL2000. http://streamhacker.com/2011/01/25/nltk-default-tagger-conll2000-tag-coverage/

  11. Processing Corpora with Python and the Natural Language Toolkit. http://www.freecode.com/articles/processing-corpora-with-python-and-the-natural-language-toolkit

  12. Corpus Readers-Tagged Corpora. http://www.nltk.org/howto/corpus.html#tagged-corpora

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Nyein Pyae Pyae Khin or Than Nwe Aung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Khin, N.P.P., Aung, T.N. (2016). Analyzing Tagging Accuracy of Part-of-Speech Taggers. In: Zin, T., Lin, JW., Pan, JS., Tin, P., Yokota, M. (eds) Genetic and Evolutionary Computing. GEC 2015. Advances in Intelligent Systems and Computing, vol 388. Springer, Cham. https://doi.org/10.1007/978-3-319-23207-2_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23207-2_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23206-5

  • Online ISBN: 978-3-319-23207-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics