Too Long-Didn’t Read: A Practical Web Based Approach towards Text Summarization

Sharma, Arjun Datt; Deep, Shaleen

doi:10.1007/978-3-319-04126-1_17

Arjun Datt Sharma¹⁸ &
Shaleen Deep¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8321))

Included in the following conference series:

International Conference on Applied Algorithms

1340 Accesses

Abstract

In today’s digital epoch, people share and read a motley of never ending electronic information, thus either a lot of time is wasted in deciphering all this information, or only a tiny amount of it is actually read. Therefore, it is imperative to contrive a generic text summarization technique. In this paper, we propose a web based and domain independent automatic text summarization method. The method focuses on generating an arbitrary length summary by extracting and assigning scores to semantically important information from the document, by analyzing term frequencies and tagging certain parts of speech like proper nouns and signal words. Another important characteristic of our approach is that it also takes font semantics of the text (like headings and emphasized texts) into consideration while scoring different entities of the document.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Conroy, J.M., O’leary, D.P.: Text summarization via hidden markov models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 406–407. ACM (2001)
Google Scholar
Chatterjee, N., Mohan, S.: Extraction-based single-document summarization using random indexing. In: 19th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2007, vol. 2, pp. 448–455. IEEE (2007)
Google Scholar
Fein, R.A., Dolan, W.B., Messerly, J., Fries, E.J., Thorpe, C.A., Cokus, S.J.: Document summarizer for word processors, US Patent 7,051,024 (May 23, 2006)
Google Scholar
Rotem, N.: The open text summarizer (2003)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Edmundson, H.P.: New methods in automatic extracting. Journal of the ACM (JACM) 16(2), 264–285 (1969)
Article MATH Google Scholar
Marcu, D.: Building up rhetorical structure trees. In: Proceedings of the National Conference on Artificial Intelligence, pp. 1069–1074 (1996)
Google Scholar
Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing text documents: sentence selection and evaluation metrics. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 121–128. ACM (1999)
Google Scholar
McLellan, P., Tombros, A., Jose, J., Ounis, I., Whitehead, M.: Evaluating summarisation technologies: A task-oriented approach. In: Proc. 1st International Workshop on New Developments in Digital Libraries (NDDL 2001), International Conference on Enterprise Information Systems (ICEIS 2001), pp. 99–112 (2001)
Google Scholar
Mani, I., House, D., Klein, G., Hirschman, L., Firmin, T., Sundheim, B.: The tipster summac text summarization evaluation. In: Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics, pp. 77–85. Association for Computational Linguistics (1999)
Google Scholar
Mitra, M., Singhal, A., Buckley, C.: Automatic text summarization by paragraph extraction. Compare 22215(22215), 26 (1997)
Google Scholar
Neto, J.L., Freitas, A.A., Kaestner, C.A.: Automatic text summarization using a machine learning approach. In: Bittencourt, G., Ramalho, G.L. (eds.) SBIA 2002. LNCS (LNAI), vol. 2507, pp. 205–215. Springer, Heidelberg (2002)
Chapter Google Scholar
Rath, G., Resnick, A., Savage, T.: The formation of abstracts by the selection of sentences. Part i. Sentence selection by men and machines. American Documentation 12(2), 139–141 (1961)
Article Google Scholar
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73. ACM (1995)
Google Scholar
Teufel, S., Moens, M.: Sentence extraction and rhetorical classification for flexible abstracts. In: Spring AAAI Symposium on Intelligent Text Summarization, pp. 89–97 (1998)
Google Scholar
Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–25. ACM (2001)
Google Scholar
CNN: Runners start London marathon with moment of silence for Boston victims (April 2013), http://edition.cnn.com/ (cited: April 21, 2013)
Palmer, D.D.: Tokenisation and sentence segmentation. Marcel Dekker, Inc., New York (2000)
Google Scholar
Coursera, Stanford University: Natural language processing, https://class.coursera.org/nlp/auth/welcome (cited: April 19, 2013)
Rau, L.F.: Extracting company names from text. In: Proceedings of the Seventh IEEE Conference on Artificial Intelligence Applications, vol. 1, pp. 29–32. IEEE (1991)
Google Scholar
Paik, W., Liddy, E.D., Yu, E., McKenna, M.: Categorizing and standardizing proper nouns for efficient information retrieval. In: Corpus Processing for Lexical Acquisition, pp. 61–73 (1996)
Google Scholar
Brenier, J.M., Cer, D., Jurafsky, D.: The detection of emphatic words using acoustic and lexical features. In: Proceedings of EUROSPEECH, pp. 3297–3300. Citeseer (2005)
Google Scholar
Fry, E.B., Fountoukidis, D., Polk, J.K.: The new reading teacher’s book of lists. Prentice-Hall, Englewood Cliffs (1985)
Google Scholar
Ricardo, B.Y., et al.: Modern information retrieval. Pearson Education, India (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

PEC University of Technology, Chandigarh, India, 160012
Arjun Datt Sharma & Shaleen Deep

Authors

Arjun Datt Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Shaleen Deep
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science and Engineering, Heritage Institute of Technology, Chowbaga Road, Anandapur, 700107, Kolkata, India
Prosenjit Gupta
Department of Computer Engineering and Informatics, University of Patras, 26500, Patras, Greece
Christos Zaroliagis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharma, A.D., Deep, S. (2014). Too Long-Didn’t Read: A Practical Web Based Approach towards Text Summarization. In: Gupta, P., Zaroliagis, C. (eds) Applied Algorithms. ICAA 2014. Lecture Notes in Computer Science, vol 8321. Springer, Cham. https://doi.org/10.1007/978-3-319-04126-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-04126-1_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04125-4
Online ISBN: 978-3-319-04126-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics