Approach to Extract Keywords and Keyphrases of Text Resources and Documents in the Kazakh Language

Rakhimova, Diana; Turganbayeva, Aliya

doi:10.1007/978-3-030-63007-2_56

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12496))

Included in the following conference series:

International Conference on Computational Collective Intelligence

1260 Accesses
3 Citations

Abstract

In this paper authors propose a hybrid approach for extracting keywords and keyphrases of text resources and documents in Kazakh. Direct application of the statistical method tf-idf is not the optimal solution to the question of extracting keywords and phrases in the Kazakh language, since the Kazakh language is an agglutinative type of language. The authors developed and used the stemming algorithm in the pre-processing process taking into account the grammatical features of the Kazakh language. In the extraction, we also take into account the syntactic feature of the words or phrases using the morphological analyzer of the Kazakh language. During extraction, the restrictions indicated by the authors are observed as well, as not all words may be key words. When choosing keywords or a phrase, their features are considered (for example, some words that are a numeral name in combination with a noun are selected). The extraction of keywords and phrases specifically for the Kazakh language is an urgent task in classification, clustering, abstracting the text, and searching the information. The results of the research indicate that the presented approach is the best solution on extracting keywords and phrases from texts in the Kazakh language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sheremeteva, S.O., Osminin, P.G.: Methods and models for automatic keyword extraction (resource language – Russian). Bull. South Ural State Univ. 1(12), 76–81 (2015)
Google Scholar
Effective Approaches for Extraction of Keywords. http://www.ijcsi.org/papers/7-6-144-148.pdf. Accessed 25 July 2019
Keyword extraction a review of methods and approaches. http://langnet.uniri.hr/papers/beliga/Beliga_KeywordExtraction_a_review_of_methods_and_approaches.pdf. Accessed 05 July 2019
Keyword extraction. https://en.wikipedia.org/wiki/Keyword_extraction. Accessed 16 June 2019
Zahang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., Wang, B.: Automatic keyword extraction from documents using conditional random fields. J. CIS 4(3), 1169–1180 (2008)
Google Scholar
Chen, P., Lin, S.: Automatic keyword prediction using Google similarity distance. Expert Syst. Appl. 37(3), 1928–1938 (2010)
Article Google Scholar
Kim, S.N., Baldwin, T., Kan, M.-Y.: An unsupervised approach to domain-specific term extraction. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 94–98 (2009)
Google Scholar
Ngomo, N.A.-C., Křemen, P.: Knowledge engineering and semantic web. In: Proceedings of the 7th International Conference, KESW 2016, Prague, Czech Republic, pp. 104–109 (2016)
Google Scholar
Lopes, L., Fernandes, P., Vieira, R.: Estimating term domain relevance through term frequency, disjoint corpora frequency-tf-dcf. Knowl.-Based Syst. 97, 156–187 (2016)
Article Google Scholar
Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2), 18–23 (2015)
Google Scholar
Jean-Louis, L., Gagnon, M., Charton, E.: A knowledge-base oriented approach for automatic keyword extraction. Computacion y Sistemas 17(2), 187–196 (2013)
Google Scholar
Zhao, Y., Shi, X.: The application of vector space model in the information retrieval system. In: Zhang, W. (ed.) Software Engineering and Knowledge Engineering: Theory and Practice, Advances in Intelligent and Soft Computing, vol. 162, pp. 43–49. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29455-6_6
Chapter Google Scholar
Hanumanthappa, M., Narayana, Swamy M., Jyothi, N.M.: Automatic keyword extraction from dravidian language. Int. J. Innov. Sci. Eng. Technol. 1(8), 87–92 (2014)
Google Scholar
Sonawane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96(19), 1–8 (2014)
Google Scholar
Mihalcea, R., Radev, D.: Graph-Based Natural Language Processing and Information Retrieval, 1st edn, p. 202. Cambridge University Press, Cambridge (2011)
Book Google Scholar

Download references

Acknowledgments

The study was supported by the Ministry of Education and Science of the Republic of Kazakhstan within the framework of the AP05132950 and AP08052421 scientific projects.

Author information

Authors and Affiliations

Al-Farabi Kazakh National University, Almaty, Kazakhstan
Diana Rakhimova & Aliya Turganbayeva

Authors

Diana Rakhimova
View author publications
You can also search for this author in PubMed Google Scholar
Aliya Turganbayeva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diana Rakhimova .

Editor information

Editors and Affiliations

Department of Applied Informatics, Wrocław University of Science and Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Thua Thien Hue Center of Information Technology, Hue, Vietnam
Bao Hung Hoang
Vietnam - Korea University of Information and Communication Technology, University of Da Nang, Da Nang, Vietnam
Cong Phap Huynh
Department of Computer Engineering, Yeungnam University, Gyeungsan, Korea (Republic of)
Dosam Hwang
Department of Applied Informatics, Wrocław University of Science and Technology, Wroclaw, Poland
Bogdan Trawiński
Department of Information Systems, University of Münster, Münster, Germany
Gottfried Vossen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rakhimova, D., Turganbayeva, A. (2020). Approach to Extract Keywords and Keyphrases of Text Resources and Documents in the Kazakh Language. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2020. Lecture Notes in Computer Science(), vol 12496. Springer, Cham. https://doi.org/10.1007/978-3-030-63007-2_56

Download citation

DOI: https://doi.org/10.1007/978-3-030-63007-2_56
Published: 23 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63006-5
Online ISBN: 978-3-030-63007-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics