Abstract
The paper presents an open-source morphological processor of Russian texts recently developed and named CrossMorphy. The processor performs lemmatization, morphological tagging of both dictionary and non-dictionary words, contextual and non-contextual morphological disambiguation, generation of word forms, as well as morphemic parsing of words. Besides the extended functionality, emphasis is put on linguistic quality of word processing and easy integration into programming projects. CrossMorphy is fully implemented in C++ programming language on the base of OpenCorpora vocabulary data. To clarify the reasons of its development, a comparison of several freely available morphological processors for Russian is given, across their linguistic and some technological properties. The experimental evaluation shows that CrossMorphy ensures rather high quality of word processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
References
Bernhard, D.: Simple morpheme labelling in unsupervised morpheme analysis. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 873–880. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-85760-0_112
Bocharov, V., Bichineva, S., Granovsky, D., Ostapuk, N., Stepanova, M.: Quality assurance tools in the opencorpora project. In: Computational Linguistics and Intelligent Technologies: Papers from the Annual International Conference “Dialogue” (2011)
Bolshakov, I.A.: CrossLexica, the universe of links between Russian words. In: Busyness Informatica, No. 3 (2013)
Bolshakova, E., Efremova, N., Noskov, A.: LSPL-patterns as a tool for information extraction from natural language texts. In: New Trends in Classification and Data Mining. Markov, K., et al. (eds.) ITHEA, Sofia, pp. 110–118 (2010)
Daciuk, J., Mihov, S., Watson, B., Watson, R.: Incremental construction of minimal acyclic finite state automata. Comput. Linguist. 26(1), 3–16 (2000)
Harris, Z.S.: Morpheme boundaries within words: report on a computer test. In: Transformations and Discourse Analysis Papers, vol. 73, pp. 68–77 (1970)
Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31
Kuzmenko, E.: Morphological analysis for Russian: integration and comparison of taggers. In: Ignatov, D.I., Khachay, M.Y., Labunets, V.G., Loukachevitch, N., Nikolenko, S.I., Panchenko, A., Savchenko, A.V., Vorontsov, K. (eds.) AIST 2016. CCIS, vol. 661, pp. 162–171. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-52920-2_16
Ljashevskaya, O., Astaf’eva, I., Bonch-Osmolovskaja, A., Garejshina, A., Grishina, J., D’jachkov, V., Ionov, M., Koroleva, A., Kudrinskij, M., Litjagina, A., Luchina, E., Sidorova, E., Toldova, S.: NLP evaluation: Russian morphological parsers. In: Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”, pp. 318–326 (2010)
Muzychka, S.A., Romanenko, A.A., Piontkovskaja, I.I.: Conditional random field for morphological disambiguation in Russian. In.: Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue”, pp. 456–465 (2014)
Segalovich, I.A.: Fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: MLMTA, pp. 273–280 (2003)
Shen, Q., Clothiaux, D., Tagtow, E., Littell, P., Dyer, C.: The role of context in neural morphological disambiguation. In: COLING 2016, 26th International Conference on Computational Linguistics. Proceedings of the Conference: Technical Papers, Osaka, Japan. ACL, pp. 181–191 (2016)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, pp. 44–49 (1994)
Smit, P., Virpioja, S., Gronroos, S., Kurimo, M.: Morfessor 2.0: toolkit for statistical morphological segmentation. In: Proceedings of the Demonstrations at the Conference of the European Chapter of the ACL, pp. 21–24 (2014)
Sorokin, A., Shavrina, T., Lyashevskaya, O., Bocharov, V., Alexeeva, S., Droganova, K., Fenogenova, A.: MorphoRuEval-2017: an evaluation track for the automatic morphological analysis methods for Russian. In: Computational Linguistics and Intellectual Technologies. Proceedings of International Conference Dialogue 2017, Moscow (2017)
Zaliznjak, A.A.: Grammatical Dictionary of Russian: Inflection. Russkij Jazyk Publisher, Moscow (1977)
Acknowledgements
We would like to thank the reviewers of our paper for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Bolshakova, E.I., Sapin, A.S. (2018). A Morphological Processor for Russian with Extended Functionality. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2017. Lecture Notes in Computer Science(), vol 10716. Springer, Cham. https://doi.org/10.1007/978-3-319-73013-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-73013-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73012-7
Online ISBN: 978-3-319-73013-4
eBook Packages: Computer ScienceComputer Science (R0)