Skip to main content

An Experimental Analysis of Feature Selection and Similarity Assessment for Textual Summarization

  • Conference paper
  • First Online:
Advances in Computing (CCC 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 735))

Included in the following conference series:

Abstract

Since the access to information is increasing every day, and we can quickly acquire knowledge from many sources such as news websites, blogs, and social networks, the capacity of processing all this information becomes increasingly difficult. So, tools are needed to automatically extract the most relevant sentences, aiming to reduce the volume of text into a shorter version. One alternative to achieve this process while preserving the core information content by using a process called Automatic Text Summarization. One relevant issue in this context is the presence of typos, synonyms, and other orthographic variations since some extractive techniques are not prepared to handle them. This work presents an evaluation of different similarity approaches to minimize these problems, selecting the most appropriate sentences to represent a document in an automatically generated summary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.google.com/trends.

  2. 2.

    https://www.google.com/news.

  3. 3.

    http://plato.stanford.edu/entries/types-tokens/.

References

  1. Cardoso, P.C., Maziero, E.G., Jorge, M.L., Seno, E.M., Di Felippo, A., Rino, L.H., Nunes, M.G., Pardo, T.A.: CSTnews-a discourse-annotated corpus for single and multi-document summarization of news texts in Brazilian Portuguese. In: Proceedings of the 3rd RST Brazilian Meeting, pp. 88–105 (2011)

    Google Scholar 

  2. Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 42, 457–479 (2004)

    Google Scholar 

  3. Hearst, M.A.: Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)

    Google Scholar 

  4. Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, Spain, vol. 8 (2004)

    Google Scholar 

  5. Murgante, B., Misra, S., Rocha, A., Torre, C., Rocha, J.G., Falcão, M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.): Computational Science and Its Applications - ICCSA 2014. LNCS, vol. 8583. Springer, Cham (2014). doi:10.1007/978-3-319-09156-3

    Google Scholar 

  6. Nenkova, A., Maskey, S., Liu, Y.: Automatic summarization. In: Proceedings Annual Meeting of the Association for Computational Linguistics, p. 3. Association for Computational Linguistics (2011)

    Google Scholar 

  7. Oliveira, H.M.: Seleção de entes complexos usando lógica difusa. Instituto de Informática da PUC-RS, dissertation (Masters in Computer Science) (1996)

    Google Scholar 

  8. Prado, H.A.D., de Oliveira, J.P.M., Ferneda, E., Wives, L.K., Silva, E.M., Loh, S.: Text mining in the context of business intelligence. In: Khosrow-Pour, M. (ed.) Encyclopedia of Information Science and Technology, 1st edn, pp. 2793–798. IGI Global, Hershey (2005)

    Chapter  Google Scholar 

  9. Ribaldo, R., Cardoso, P.C.F., Pardo, T.A.S.: Exploring the subtopic-based relationship map strategy for multi-document summarization. Revista de Informática Teórica e Aplicada 23(1), 183–211 (2016)

    Google Scholar 

  10. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). http://doi.acm.org/10.1145/361219.361220

    Article  MATH  Google Scholar 

  11. Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)

    Article  Google Scholar 

  12. Wilcoxon, F., Katti, S., Wilcox, R.A.: Critical values and probability levels for the wilcoxon rank sum test and the wilcoxon signed rank test. Sel. Tables Math. Stat. 1, 171–259 (1970)

    MATH  Google Scholar 

  13. Wives, L.K.: Utilizando conceitos como descritores de textos para o processo de identificação de conglomerados (clustering) de documentos. Ph.D. thesis, Universidade Federal do Rio Grande do Sul (2004)

    Google Scholar 

  14. Wives, L.K., Loh, S.: Recuperação de informaçães usando a expansão semântica e a lógica difusa. In: Congreso Internacional de Ingeniería Informática, pp. 201–211. CITA, Faculdad de Ingenieria (1998)

    Google Scholar 

  15. Wives, L.K., Loh, S., de Oliveira, J. P.M.: A comparative study of clustering versus classification over reuters collection. In: Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems, pp. 231–236 (2009)

    Google Scholar 

  16. Wives, L.K., de Oliveira, J.P.M., Loh, S.: Conceptual clustering of textual documents and some insights for knowledge discovery. In: Prado, H.d., Ferneda, E. (eds.) Text Mining: Techniques and Applications, pp. 223–243. Information Science Reference, Hershey (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leandro Krug Wives .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ramos, A.M.S., Woloszyn, V., Wives, L.K. (2017). An Experimental Analysis of Feature Selection and Similarity Assessment for Textual Summarization. In: Solano, A., Ordoñez, H. (eds) Advances in Computing. CCC 2017. Communications in Computer and Information Science, vol 735. Springer, Cham. https://doi.org/10.1007/978-3-319-66562-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66562-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66561-0

  • Online ISBN: 978-3-319-66562-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics