Skip to main content

Context-Based Rules for Grammatical Disambiguation in the Tatar Language

  • Conference paper
  • First Online:
Computational Collective Intelligence (ICCCI 2017)

Abstract

The paper is dedicated to the problem of grammatical ambiguity in the Tatar National Corpus and describes the methodology and software used for automation of the disambiguation process. Grammatical ambiguity is widely represented in agglutinative languages like Turkic or Finno-Ugric. Disambiguation in the corpus is based on the context-oriented classification of ambiguity types which has been carried out on corpus data in the Tatar language for the first time. In this study the corpus is used as a source for the research and at the same time as a destination for implementing the results. The grammatical ambiguity types are detected automatically using the finite-state morphological analyzer and then classified. In order to build up the grammatically disambiguated subcorpus, a special software module was developed. It searches for ambiguous tokens in the corpus, collects statistical information and allows creating and implementing the formal context-based disambiguation rules for different ambiguity types.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. «Tugan Tel» Tatar National Corpus Homepage. http://tugantel.tatar/?lang=en. 05 June 2017

  2. Suleymanov, D.S., Nevzorova, O.A., Gatiatullin, A.R., Gilmullin, R.A., Khakimov, B.E.: National corpus of the Tatar language “Tugan Tel”: grammatical annotation and implementation. Procedia Soc. Behav. Sci. 95, 68–74 (2013)

    Article  Google Scholar 

  3. Suleymanov, D.S., Khakimov, B.E., Gilmullin, R.A.: Corpus of Tatar: conception and linguistic aspects (in Russian). Philol. Cult. 4(26), 211–216 (2011)

    Google Scholar 

  4. Suleymanov, D.S., Gilmullin, R.A.: Two-level description of the Tatar morphology (in Russian). In: Proceedings of “Language Semantics and Image of the World” International Scientific Conference, vol. 2, pp. 65–67. Kazan State University, Kazan (1997)

    Google Scholar 

  5. Galieva, A.M., Khakimov, B.E., Gatiatullin, A.R.: A Metalanguage for describing the structure of Tatar word forms for corpus grammatical annotations (in Russian). In: Uchenye Zapiski Kazanskogo Universiteta, vol. 155(5), pp. 287–296. Seriya Gumanitarnye Nauki (2013)

    Google Scholar 

  6. HFST Homepage. https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstHome. Accessed 20 Apr 2017

  7. Kurbatov, K.: Grammatical homonyms in the Tatar language (in Tatar). J. Tatar Lang. Lit. 307–311 (1959)

    Google Scholar 

  8. Salimgarayeva, B.: Homonyms in modern Tatar language: abstract of dissertation (in Tatar). Bashkir State University, Ufa (1971)

    Google Scholar 

  9. Salakhova, R.R.: Homonym suffixes of the Tatar language (in Russian). Gumanitarya, Kazan (2007)

    Google Scholar 

  10. Khakimov, B.E., Gilmullin, R.A., Gataullin, R.R.: Grammatical disambiguation in the corpus of the Tatar Language (in Russian). Uchenye Zapiski Kazanskogo Universiteta. Seriya Gumanitarnye Nauki 156(5), 236–244 (2014)

    Google Scholar 

  11. Brill, E.: Unsupervised learning of disambiguation rules for part of speech tagging. In: Proceedings of the Third Workshop on Very Large Corpora, vol. 30, pp. 1–13. Association for Computational Linguistics, Somerset (1995)

    Google Scholar 

  12. Yuret, D., Ture, F.: Learning morphological disambiguation rules for Turkish. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pp. 328–334. ACL, New York (2006)

    Google Scholar 

  13. Nevzorova, O.A., Zinkina, Y., Pyatkin, N.: Resolution of functional homonymy in the Russian language based on context rules (in Russian). In: Proceedings of “Dialog’2005” International Conference, pp. 198–202. Nauka, Moscow (2005)

    Google Scholar 

  14. Tatar Grammar: Morphology (in Russian), vol. 2. Tatar Publishing Company, Kazan (1993)

    Google Scholar 

  15. Tatar Grammar: Morphology (in Tatar), vol. 2. Insan, Moscow. Fiker, Kazan (2002)

    Google Scholar 

  16. Gataullin, R.R., Gilmullin, R.A.: Web interface for removing morphological ambiguity in the corpus of the Tatar language (in Russian). In: Open Semantic Technologies for Intelligent Systems OSTIS-2015 Proceedings of IV International Scientific and Technical Conference, pp. 451–454. BSUIR, Minsk (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bulat Khakimov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Gataullin, R., Khakimov, B., Suleymanov, D., Gilmullin, R. (2017). Context-Based Rules for Grammatical Disambiguation in the Tatar Language. In: Nguyen, N., Papadopoulos, G., Jędrzejowicz, P., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2017. Lecture Notes in Computer Science(), vol 10449. Springer, Cham. https://doi.org/10.1007/978-3-319-67077-5_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67077-5_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67076-8

  • Online ISBN: 978-3-319-67077-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics