Abstract
In information retrieval systems, it is very important that indexing is defined very well by appropriate terms about documents. In this paper, we propose a simple retrieval model based on terms distribution characteristics besides term frequency in documents. We define the keywords distribution characteristics using a statistics, standard deviation. We can extract document keywords that term frequency is great and standard deviation is great. And if term frequency is great and standard deviation is small, the terms can be defined as paragraph keywords. Applying our proposed retrieval model we can search many documents or knowledge using the document keywords and paragraph keywords.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures & Algorithms. Prentice Hall, Englewood Cliffs (1992)
Bookstein, A., Swanson, D.R.: Probabilistic Models for Automatic Indexing. Journal of the American Society for Information Science 25(5), 312–318 (1974)
Salton, G., Yang, C.S.: On the Specification of Term Values in Automatic Indexing. Journal of Documentation 29(4), 351–372 (1973)
Aho, A., Corasick, M.: Efficient String Matching: An Aid to Bibliographic Search. Communication of the ACM 18(6), 333–340 (1975)
Fox, C.: A Stop List for General Text. SIGIR Forum 24(1-2), 19–35 (1990)
Harman, D.: How Effective is Suffixing? Journal of the American Society for Information Science 42(1), 7–15 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, JW., Baik, DK. (2004). A Model for Extracting Keywords of Document Using Term Frequency and Distribution. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2004. Lecture Notes in Computer Science, vol 2945. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24630-5_53
Download citation
DOI: https://doi.org/10.1007/978-3-540-24630-5_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21006-1
Online ISBN: 978-3-540-24630-5
eBook Packages: Springer Book Archive