Abstract
In this paper authors propose a hybrid approach for extracting keywords and keyphrases of text resources and documents in Kazakh. Direct application of the statistical method tf-idf is not the optimal solution to the question of extracting keywords and phrases in the Kazakh language, since the Kazakh language is an agglutinative type of language. The authors developed and used the stemming algorithm in the pre-processing process taking into account the grammatical features of the Kazakh language. In the extraction, we also take into account the syntactic feature of the words or phrases using the morphological analyzer of the Kazakh language. During extraction, the restrictions indicated by the authors are observed as well, as not all words may be key words. When choosing keywords or a phrase, their features are considered (for example, some words that are a numeral name in combination with a noun are selected). The extraction of keywords and phrases specifically for the Kazakh language is an urgent task in classification, clustering, abstracting the text, and searching the information. The results of the research indicate that the presented approach is the best solution on extracting keywords and phrases from texts in the Kazakh language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sheremeteva, S.O., Osminin, P.G.: Methods and models for automatic keyword extraction (resource language – Russian). Bull. South Ural State Univ. 1(12), 76–81 (2015)
Effective Approaches for Extraction of Keywords. http://www.ijcsi.org/papers/7-6-144-148.pdf. Accessed 25 July 2019
Keyword extraction a review of methods and approaches. http://langnet.uniri.hr/papers/beliga/Beliga_KeywordExtraction_a_review_of_methods_and_approaches.pdf. Accessed 05 July 2019
Keyword extraction. https://en.wikipedia.org/wiki/Keyword_extraction. Accessed 16 June 2019
Zahang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., Wang, B.: Automatic keyword extraction from documents using conditional random fields. J. CIS 4(3), 1169–1180 (2008)
Chen, P., Lin, S.: Automatic keyword prediction using Google similarity distance. Expert Syst. Appl. 37(3), 1928–1938 (2010)
Kim, S.N., Baldwin, T., Kan, M.-Y.: An unsupervised approach to domain-specific term extraction. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 94–98 (2009)
Ngomo, N.A.-C., Křemen, P.: Knowledge engineering and semantic web. In: Proceedings of the 7th International Conference, KESW 2016, Prague, Czech Republic, pp. 104–109 (2016)
Lopes, L., Fernandes, P., Vieira, R.: Estimating term domain relevance through term frequency, disjoint corpora frequency-tf-dcf. Knowl.-Based Syst. 97, 156–187 (2016)
Siddiqi, S., Sharan, A.: Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2), 18–23 (2015)
Jean-Louis, L., Gagnon, M., Charton, E.: A knowledge-base oriented approach for automatic keyword extraction. Computacion y Sistemas 17(2), 187–196 (2013)
Zhao, Y., Shi, X.: The application of vector space model in the information retrieval system. In: Zhang, W. (ed.) Software Engineering and Knowledge Engineering: Theory and Practice, Advances in Intelligent and Soft Computing, vol. 162, pp. 43–49. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29455-6_6
Hanumanthappa, M., Narayana, Swamy M., Jyothi, N.M.: Automatic keyword extraction from dravidian language. Int. J. Innov. Sci. Eng. Technol. 1(8), 87–92 (2014)
Sonawane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96(19), 1–8 (2014)
Mihalcea, R., Radev, D.: Graph-Based Natural Language Processing and Information Retrieval, 1st edn, p. 202. Cambridge University Press, Cambridge (2011)
Acknowledgments
The study was supported by the Ministry of Education and Science of the Republic of Kazakhstan within the framework of the AP05132950 and AP08052421 scientific projects.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Rakhimova, D., Turganbayeva, A. (2020). Approach to Extract Keywords and Keyphrases of Text Resources and Documents in the Kazakh Language. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawiński, B., Vossen, G. (eds) Computational Collective Intelligence. ICCCI 2020. Lecture Notes in Computer Science(), vol 12496. Springer, Cham. https://doi.org/10.1007/978-3-030-63007-2_56
Download citation
DOI: https://doi.org/10.1007/978-3-030-63007-2_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63006-5
Online ISBN: 978-3-030-63007-2
eBook Packages: Computer ScienceComputer Science (R0)