Skip to main content

A Case Study in Tagging Case in German: An Assessment of Statistical Approaches

  • Conference paper
Systems and Frameworks for Computational Morphology (SFCM 2013)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 380))

  • 333 Accesses

Abstract

In this study, we assess the performance of purely statistical approaches using supervised machine learning for predicting case in German (nominative, accusative, dative, genitive, n/a). We experiment with two different treebanks containing morphological annotations: TIGER and TUEBA. An evaluation with 10-fold cross-validation serves as the basis for systematic comparisons of the optimal parametrizations of different approaches. We test taggers based on Hidden Markov Models (HMM), Decision Trees, and Conditional Random Fields (CRF). The CRF approach based on our hand-crafted feature model achieves an accuracy of about 94%. This outperforms all other approaches and results in an improvement of 11% compared to a baseline HMM trigram tagger and an improvement of 2% compared to a state-of-the-art tagger for rich morphological tagsets. Moreover, we investigate the effect of additional (morphological) categories (gender, number, person, part of speech) in the internal tagset used for the training. Rich internal tagsets improve results for all tested approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Koskeniemmi, K., Haapalainen, M.: GERTWOL – Lingsoft Oy. In: Hausser, R. (ed.) Linguistische Verifikation: Dokumentation zur Ersten Morpholympics 1994, Niemeyer, Tübingen. Sprache und Information, vol. 34, pp. 121–140 (1996)

    Google Scholar 

  2. Zielinski, A., Simon, C.: Morphisto: An open-source morphological analyzer for German. In: Seventh International Workshop on Finite-State Methods and Natural Language Processing, pp. 177–184 (2008)

    Google Scholar 

  3. Lezius, W., Rapp, R., Wettler, M.: A freely available morphological analyzer, disambiguator and context sensitive lemmatizer for German. In: Proceedings of COLING-ACL 1998: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, vol. 2, pp. 743–748 (1998)

    Google Scholar 

  4. Schmid, H., Laws, F.: Estimation of conditional probabilities with decision trees and an application to fine-grained POS tagging. In: Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 777–784 (August 2008)

    Google Scholar 

  5. Perera, P., Witte, R.: A self-learning context-aware lemmatizer for German. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), October 6-8, pp. 636–643. Association for Computational Linguistics, ACL, Vancouver (2005)

    Google Scholar 

  6. Brants, T.: TnT – a statistical part-of-speech tagger. In: Proceedings of the Sixth Applied Natural Language Processing Conference ANLP 2000, pp. 224–231 (2000)

    Google Scholar 

  7. Schiller, A., Teufel, S., Stöckert, C.: Guidelines für das Tagging deutscher Textcorpora mit STTS (Kleines und großes Tagset) (1999)

    Google Scholar 

  8. Sutton, C.A., McCallum, A.: An introduction to conditional random fields. Foundations and Trends in Machine Learning 4(4), 267–373 (2012)

    Article  Google Scholar 

  9. Lavergne, T., Cappé, O., Yvon, F.: Practical very large scale CRFs. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 504–513. Association for Computational Linguistics (July 2010)

    Google Scholar 

  10. Brants, T.: Internal and external tagsets in part-of-speech tagging. In: Proceedings of Eurospeech, pp. 2787–2790 (1997)

    Google Scholar 

  11. Brants, S., Dipper, S., Eisenberg, P., Hansen-Schirra, S., König, E., Lezius, W., Rohrer, C., Smith, G., Uszkoreit, H.: Tiger: Linguistic interpretation of a german corpus. Research on Language and Computation 2(4), 597–620 (2004)

    Article  Google Scholar 

  12. Hinrichs, E., Kübler, S., Naumann, K., Telljohann, H., Trushkina, J.: Recent developments in linguistic annotations of the TüBa-D/Z treebank. In: Proceedings of the Third Workshop on Treebanks and Linguistic Theories, pp. 51–62 (2004)

    Google Scholar 

  13. Halácsy, P., Kornai, A., Oravecz, C.: Hunpos: an open source trigram tagger. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL 2007, pp. 209–212. Association for Computational Linguistics, Stroudsburg (2007)

    Google Scholar 

  14. Constant, M., Tellier, I.: Evaluating the impact of external lexical resources into a CRF-based multiword segmenter and part-of-speech tagger. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 646–650 (May 2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Clematide, S. (2013). A Case Study in Tagging Case in German: An Assessment of Statistical Approaches. In: Mahlow, C., Piotrowski, M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2013. Communications in Computer and Information Science, vol 380. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40486-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40486-3_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40485-6

  • Online ISBN: 978-3-642-40486-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics