Skip to main content

Russian Tagging and Dependency Parsing Models for Stanford CoreNLP Natural Language Toolkit

  • Conference paper
  • First Online:
Knowledge Engineering and Semantic Web (KESW 2017)

Abstract

The paper concerns implementing maximum entropy tagging model and neural net dependency parser model for Russian language in Stanford CoreNLP toolkit, an extensible pipeline that provides core natural language analysis. Russian belongs to morphologically rich languages and demands full morphological analysis including annotating input texts with POS tags, features and lemmas (unlike the case of case-, person-, etc. insensitive languages when stemming and POS-tagging give enough information about grammatical behavior of a word form). Rich morphology is accompanied by free word order in Russian which adds indeterminacy to head finding rules in parsing procedures. In the paper we describe training data, linguistic features used to learn the classifiers, training and evaluation of tagging and parsing models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/MANASLU8/CoreNLP.

  2. 2.

    https://github.com/MANASLU8/CoreNLPRusModels.

  3. 3.

    http://universaldependencies.org/.

  4. 4.

    https://ufal.mff.cuni.cz/udpipe.

  5. 5.

    http://universaldependencies.org/u/pos/all.html.

  6. 6.

    Examples were taken from Ilya Kormiltsev poetry, therefore, English translations in the figures footnotes are approximate and do not preserve the author’s syntax.

  7. 7.

    https://nlp.stanford.edu/software/tagger.html.

  8. 8.

    https://nlp.stanford.edu/software/nndep.shtml.

  9. 9.

    https://nlp.stanford.edu/nlp/javadoc/javanlp/.

  10. 10.

    http://www.ruscorpora.ru/.

  11. 11.

    https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1983.

  12. 12.

    http://universaldependencies.org/docs/format.html.

  13. 13.

    http://odict.ru/.

  14. 14.

    http://statmt.org/.

  15. 15.

    https://github.com/dialogue-evaluation/morphoRuEval-2017.

References

  1. Manning, C.D., et al.: The standford CoreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)

    Google Scholar 

  2. de Marneffe, M.-C., et al.: Universal Dependencies: A cross-linguistic typology. In: Language Resources and Evaluation Conference (LREC), European Language Resources Association (ELRA), Iceland, Reykjavik, pp. 4585–4592 (2014). ISBN:978-2-9517408-8-4

    Google Scholar 

  3. de Marneffe, M.-C., et al.: Extending stanford dependencies. In: Proceedings of the 13th International Conference on Dependency Linguistics, pp. 187–196 (2013). ISBN:978-2-9517408-9-1

    Google Scholar 

  4. Dobrovojc, K., Nivre, J.: The universal dependencies treebank of spoken slovenian. In: Proceedings of LREC Conference, European Language Resources Association (ELRA), Portoro\(\check{z}\), Slovenia, pp. 1566–1573 (2016)

    Google Scholar 

  5. Toutanova, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), vol. 13, pp. 63–70 (2000)

    Google Scholar 

  6. Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 740–750 (2014)

    Google Scholar 

  7. Nivre, J.: Algorithms for deterministic incremental dependency parsing. Comput. Linguist. 34(4), 513–553 (2008). doi:10.1162/coli.07-056-R1-07-027

    Article  MathSciNet  Google Scholar 

  8. Nivre, J., et al.: Labeled pseudo-projective dependency parsing with support vector machines. In: Proceedings of the 10th Conference on Computational Natural Language Learning, CoNLL 2006, pp. 221–225 (2006)

    Google Scholar 

  9. Zeman, D., Popel, M., Straka, M., Hajic, J., Nivre, J., et al.: CoNLL 2017 shared task: multilingual parsing from raw text to universal dependencies. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Vancouver, Canada, August 3–4, 2017, pp. 1–19 (2017). doi:10.18653/v1/K17-3001

  10. Benko, V., Zakharov, V.P.: Very large russian corpora: new opportunities and new challenges. In: Proceedings of the International Conference “Dialogue 2016” (2016)

    Google Scholar 

  11. Nivre, J., Boguslavsky, I.M., Iomdin, L.L.: Parsing the SynTagRus treebank of russian. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 641–648 (2008). ISBN: 978-1-905593-44-6

    Google Scholar 

Download references

Acknowledgment

This work was financially supported by the Russian Fund of Basic Research (RFBR), Grant No. 16-36-60055.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liubov Kovriguina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kovriguina, L., Shilin, I., Shipilo, A., Putintseva, A. (2017). Russian Tagging and Dependency Parsing Models for Stanford CoreNLP Natural Language Toolkit. In: Różewski, P., Lange, C. (eds) Knowledge Engineering and Semantic Web. KESW 2017. Communications in Computer and Information Science, vol 786. Springer, Cham. https://doi.org/10.1007/978-3-319-69548-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69548-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69547-1

  • Online ISBN: 978-3-319-69548-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics