Abstract
In today’s digital epoch, people share and read a motley of never ending electronic information, thus either a lot of time is wasted in deciphering all this information, or only a tiny amount of it is actually read. Therefore, it is imperative to contrive a generic text summarization technique. In this paper, we propose a web based and domain independent automatic text summarization method. The method focuses on generating an arbitrary length summary by extracting and assigning scores to semantically important information from the document, by analyzing term frequencies and tagging certain parts of speech like proper nouns and signal words. Another important characteristic of our approach is that it also takes font semantics of the text (like headings and emphasized texts) into consideration while scoring different entities of the document.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Conroy, J.M., O’leary, D.P.: Text summarization via hidden markov models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 406–407. ACM (2001)
Chatterjee, N., Mohan, S.: Extraction-based single-document summarization using random indexing. In: 19th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2007, vol. 2, pp. 448–455. IEEE (2007)
Fein, R.A., Dolan, W.B., Messerly, J., Fries, E.J., Thorpe, C.A., Cokus, S.J.: Document summarizer for word processors, US Patent 7,051,024 (May 23, 2006)
Rotem, N.: The open text summarizer (2003)
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)
Edmundson, H.P.: New methods in automatic extracting. Journal of the ACM (JACM) 16(2), 264–285 (1969)
Marcu, D.: Building up rhetorical structure trees. In: Proceedings of the National Conference on Artificial Intelligence, pp. 1069–1074 (1996)
Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing text documents: sentence selection and evaluation metrics. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 121–128. ACM (1999)
McLellan, P., Tombros, A., Jose, J., Ounis, I., Whitehead, M.: Evaluating summarisation technologies: A task-oriented approach. In: Proc. 1st International Workshop on New Developments in Digital Libraries (NDDL 2001), International Conference on Enterprise Information Systems (ICEIS 2001), pp. 99–112 (2001)
Mani, I., House, D., Klein, G., Hirschman, L., Firmin, T., Sundheim, B.: The tipster summac text summarization evaluation. In: Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics, pp. 77–85. Association for Computational Linguistics (1999)
Mitra, M., Singhal, A., Buckley, C.: Automatic text summarization by paragraph extraction. Compare 22215(22215), 26 (1997)
Neto, J.L., Freitas, A.A., Kaestner, C.A.: Automatic text summarization using a machine learning approach. In: Bittencourt, G., Ramalho, G.L. (eds.) SBIA 2002. LNCS (LNAI), vol. 2507, pp. 205–215. Springer, Heidelberg (2002)
Rath, G., Resnick, A., Savage, T.: The formation of abstracts by the selection of sentences. Part i. Sentence selection by men and machines. American Documentation 12(2), 139–141 (1961)
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73. ACM (1995)
Teufel, S., Moens, M.: Sentence extraction and rhetorical classification for flexible abstracts. In: Spring AAAI Symposium on Intelligent Text Summarization, pp. 89–97 (1998)
Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–25. ACM (2001)
CNN: Runners start London marathon with moment of silence for Boston victims (April 2013), http://edition.cnn.com/ (cited: April 21, 2013)
Palmer, D.D.: Tokenisation and sentence segmentation. Marcel Dekker, Inc., New York (2000)
Coursera, Stanford University: Natural language processing, https://class.coursera.org/nlp/auth/welcome (cited: April 19, 2013)
Rau, L.F.: Extracting company names from text. In: Proceedings of the Seventh IEEE Conference on Artificial Intelligence Applications, vol. 1, pp. 29–32. IEEE (1991)
Paik, W., Liddy, E.D., Yu, E., McKenna, M.: Categorizing and standardizing proper nouns for efficient information retrieval. In: Corpus Processing for Lexical Acquisition, pp. 61–73 (1996)
Brenier, J.M., Cer, D., Jurafsky, D.: The detection of emphatic words using acoustic and lexical features. In: Proceedings of EUROSPEECH, pp. 3297–3300. Citeseer (2005)
Fry, E.B., Fountoukidis, D., Polk, J.K.: The new reading teacher’s book of lists. Prentice-Hall, Englewood Cliffs (1985)
Ricardo, B.Y., et al.: Modern information retrieval. Pearson Education, India (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sharma, A.D., Deep, S. (2014). Too Long-Didn’t Read: A Practical Web Based Approach towards Text Summarization. In: Gupta, P., Zaroliagis, C. (eds) Applied Algorithms. ICAA 2014. Lecture Notes in Computer Science, vol 8321. Springer, Cham. https://doi.org/10.1007/978-3-319-04126-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-04126-1_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04125-4
Online ISBN: 978-3-319-04126-1
eBook Packages: Computer ScienceComputer Science (R0)