Skip to main content

Too Long-Didn’t Read: A Practical Web Based Approach towards Text Summarization

  • Conference paper
Applied Algorithms (ICAA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8321))

Included in the following conference series:

  • 1340 Accesses

Abstract

In today’s digital epoch, people share and read a motley of never ending electronic information, thus either a lot of time is wasted in deciphering all this information, or only a tiny amount of it is actually read. Therefore, it is imperative to contrive a generic text summarization technique. In this paper, we propose a web based and domain independent automatic text summarization method. The method focuses on generating an arbitrary length summary by extracting and assigning scores to semantically important information from the document, by analyzing term frequencies and tagging certain parts of speech like proper nouns and signal words. Another important characteristic of our approach is that it also takes font semantics of the text (like headings and emphasized texts) into consideration while scoring different entities of the document.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Conroy, J.M., O’leary, D.P.: Text summarization via hidden markov models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 406–407. ACM (2001)

    Google Scholar 

  2. Chatterjee, N., Mohan, S.: Extraction-based single-document summarization using random indexing. In: 19th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2007, vol. 2, pp. 448–455. IEEE (2007)

    Google Scholar 

  3. Fein, R.A., Dolan, W.B., Messerly, J., Fries, E.J., Thorpe, C.A., Cokus, S.J.: Document summarizer for word processors, US Patent 7,051,024 (May 23, 2006)

    Google Scholar 

  4. Rotem, N.: The open text summarizer (2003)

    Google Scholar 

  5. Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  6. Edmundson, H.P.: New methods in automatic extracting. Journal of the ACM (JACM) 16(2), 264–285 (1969)

    Article  MATH  Google Scholar 

  7. Marcu, D.: Building up rhetorical structure trees. In: Proceedings of the National Conference on Artificial Intelligence, pp. 1069–1074 (1996)

    Google Scholar 

  8. Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing text documents: sentence selection and evaluation metrics. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 121–128. ACM (1999)

    Google Scholar 

  9. McLellan, P., Tombros, A., Jose, J., Ounis, I., Whitehead, M.: Evaluating summarisation technologies: A task-oriented approach. In: Proc. 1st International Workshop on New Developments in Digital Libraries (NDDL 2001), International Conference on Enterprise Information Systems (ICEIS 2001), pp. 99–112 (2001)

    Google Scholar 

  10. Mani, I., House, D., Klein, G., Hirschman, L., Firmin, T., Sundheim, B.: The tipster summac text summarization evaluation. In: Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics, pp. 77–85. Association for Computational Linguistics (1999)

    Google Scholar 

  11. Mitra, M., Singhal, A., Buckley, C.: Automatic text summarization by paragraph extraction. Compare 22215(22215), 26 (1997)

    Google Scholar 

  12. Neto, J.L., Freitas, A.A., Kaestner, C.A.: Automatic text summarization using a machine learning approach. In: Bittencourt, G., Ramalho, G.L. (eds.) SBIA 2002. LNCS (LNAI), vol. 2507, pp. 205–215. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  13. Rath, G., Resnick, A., Savage, T.: The formation of abstracts by the selection of sentences. Part i. Sentence selection by men and machines. American Documentation 12(2), 139–141 (1961)

    Article  Google Scholar 

  14. Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73. ACM (1995)

    Google Scholar 

  15. Teufel, S., Moens, M.: Sentence extraction and rhetorical classification for flexible abstracts. In: Spring AAAI Symposium on Intelligent Text Summarization, pp. 89–97 (1998)

    Google Scholar 

  16. Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–25. ACM (2001)

    Google Scholar 

  17. CNN: Runners start London marathon with moment of silence for Boston victims (April 2013), http://edition.cnn.com/ (cited: April 21, 2013)

  18. Palmer, D.D.: Tokenisation and sentence segmentation. Marcel Dekker, Inc., New York (2000)

    Google Scholar 

  19. Coursera, Stanford University: Natural language processing, https://class.coursera.org/nlp/auth/welcome (cited: April 19, 2013)

  20. Rau, L.F.: Extracting company names from text. In: Proceedings of the Seventh IEEE Conference on Artificial Intelligence Applications, vol. 1, pp. 29–32. IEEE (1991)

    Google Scholar 

  21. Paik, W., Liddy, E.D., Yu, E., McKenna, M.: Categorizing and standardizing proper nouns for efficient information retrieval. In: Corpus Processing for Lexical Acquisition, pp. 61–73 (1996)

    Google Scholar 

  22. Brenier, J.M., Cer, D., Jurafsky, D.: The detection of emphatic words using acoustic and lexical features. In: Proceedings of EUROSPEECH, pp. 3297–3300. Citeseer (2005)

    Google Scholar 

  23. Fry, E.B., Fountoukidis, D., Polk, J.K.: The new reading teacher’s book of lists. Prentice-Hall, Englewood Cliffs (1985)

    Google Scholar 

  24. Ricardo, B.Y., et al.: Modern information retrieval. Pearson Education, India (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sharma, A.D., Deep, S. (2014). Too Long-Didn’t Read: A Practical Web Based Approach towards Text Summarization. In: Gupta, P., Zaroliagis, C. (eds) Applied Algorithms. ICAA 2014. Lecture Notes in Computer Science, vol 8321. Springer, Cham. https://doi.org/10.1007/978-3-319-04126-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04126-1_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04125-4

  • Online ISBN: 978-3-319-04126-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics