Abstract
Documents are unstructured data consisting of natural language. Document surrogate means the structured data converted from original documents to process them in computer systems. Document surrogate is usually represented into a list of words. Because not all words in a document reflect its content, it is necessary to select important words related with its content among them. Such important words are called keywords and they are selected with a particular equation based on TF (Term Frequency) and IDF (inverted Document Frequency). Actually, not only TF and IDF but also the position of each word in the document and the inclusion of the word in the title should be considered to select keywords among words contained in the text. The equation based on these factors gets too complicate to be applied to the selection of keywords. This paper proposes the neural network model, back propagation, in which these factors are used as the features and feature vectors are generated, and with which keywords are selected. This paper will show that back-propagation outperforms the equation in distinguishing keywords.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Korfhage, R. R., Information Storage and Retrieval, John Wiley & Sons Inc (1997)
Salton, G. and Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Information Processing & Management. 24 (1988) 513–523
Pereira F., Tishby, N., and Lee, L.: Distributional Clustering of English Words. The Proceedings of 30th Annual Meeting of the Association for Computational Linguistics, (1993) 183–190
Yang, Y: Noise Reduction in a Statistical Approaches to Text Categorization. The Proceedings of SIGIR 95, (1995) 256–263
Wiener, E. D.: A Neural Network Approach to Topic Spotting in Text. Thesis of the Graduate School of the University of Colorado, (1995)
Maron, M. E.: Probabilistic Indexing and Information Retrieval. In: Sparck, K. and Willett, P. (eds.): Readings in Information Retrieval. Readings in Information Retrieval (1997) 39–46
Tseng, Y.: Multilingual Keyword Extraction for Term Suggestion. The Proceedings of SIGIR 98, (1998) 377–378
Hofmann, T.: Probabilistic latent indexing. The Proceedings of SIGIR 99, (1999) 50–57.
Soderland, S.: Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning 34. (1999) 233–272
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Machine Learning 39 (2000) 169–202
Freeman, J. A. and Skapura, D.M.: Neural Networks: Algorithms, Applications, and Programming Techniques. Addison-Wesley Publishing Company (1992)
Korfhage, R.R.: Information Storage and Retrieval. John Wiley & Sons Inc (1997)
Jo, T.: The Application of Text Mining to Knowledge Management System, Kwave. white paper in Samsung SDS, (1998).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jo, T. (2003). Neural Based Approach to Keyword Extraction from Documents. In: Kumar, V., Gavrilova, M.L., Tan, C.J.K., L’Ecuyer, P. (eds) Computational Science and Its Applications — ICCSA 2003. ICCSA 2003. Lecture Notes in Computer Science, vol 2667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44839-X_49
Download citation
DOI: https://doi.org/10.1007/3-540-44839-X_49
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40155-1
Online ISBN: 978-3-540-44839-6
eBook Packages: Springer Book Archive