An Experimental Analysis of Feature Selection and Similarity Assessment for Textual Summarization

Ramos, Ana Maria Schwendler; Woloszyn, Vinicius; Wives, Leandro Krug

doi:10.1007/978-3-319-66562-7_11

Ana Maria Schwendler Ramos¹¹,
Vinicius Woloszyn¹¹ &
Leandro Krug Wives¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 735))

Included in the following conference series:

Colombian Conference on Computing

1683 Accesses
1 Citations

Abstract

Since the access to information is increasing every day, and we can quickly acquire knowledge from many sources such as news websites, blogs, and social networks, the capacity of processing all this information becomes increasingly difficult. So, tools are needed to automatically extract the most relevant sentences, aiming to reduce the volume of text into a shorter version. One alternative to achieve this process while preserving the core information content by using a process called Automatic Text Summarization. One relevant issue in this context is the presence of typos, synonyms, and other orthographic variations since some extractive techniques are not prepared to handle them. This work presents an evaluation of different similarity approaches to minimize these problems, selecting the most appropriate sentences to represent a document in an automatically generated summary.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Cardoso, P.C., Maziero, E.G., Jorge, M.L., Seno, E.M., Di Felippo, A., Rino, L.H., Nunes, M.G., Pardo, T.A.: CSTnews-a discourse-annotated corpus for single and multi-document summarization of news texts in Brazilian Portuguese. In: Proceedings of the 3rd RST Brazilian Meeting, pp. 88–105 (2011)
Google Scholar
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 42, 457–479 (2004)
Google Scholar
Hearst, M.A.: Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)
Google Scholar
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, Spain, vol. 8 (2004)
Google Scholar
Murgante, B., Misra, S., Rocha, A., Torre, C., Rocha, J.G., Falcão, M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.): Computational Science and Its Applications - ICCSA 2014. LNCS, vol. 8583. Springer, Cham (2014). doi:10.1007/978-3-319-09156-3
Google Scholar
Nenkova, A., Maskey, S., Liu, Y.: Automatic summarization. In: Proceedings Annual Meeting of the Association for Computational Linguistics, p. 3. Association for Computational Linguistics (2011)
Google Scholar
Oliveira, H.M.: Seleção de entes complexos usando lógica difusa. Instituto de Informática da PUC-RS, dissertation (Masters in Computer Science) (1996)
Google Scholar
Prado, H.A.D., de Oliveira, J.P.M., Ferneda, E., Wives, L.K., Silva, E.M., Loh, S.: Text mining in the context of business intelligence. In: Khosrow-Pour, M. (ed.) Encyclopedia of Information Science and Technology, 1st edn, pp. 2793–798. IGI Global, Hershey (2005)
Chapter Google Scholar
Ribaldo, R., Cardoso, P.C.F., Pardo, T.A.S.: Exploring the subtopic-based relationship map strategy for multi-document summarization. Revista de Informática Teórica e Aplicada 23(1), 183–211 (2016)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). http://doi.acm.org/10.1145/361219.361220
Article MATH Google Scholar
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Article Google Scholar
Wilcoxon, F., Katti, S., Wilcox, R.A.: Critical values and probability levels for the wilcoxon rank sum test and the wilcoxon signed rank test. Sel. Tables Math. Stat. 1, 171–259 (1970)
MATH Google Scholar
Wives, L.K.: Utilizando conceitos como descritores de textos para o processo de identificação de conglomerados (clustering) de documentos. Ph.D. thesis, Universidade Federal do Rio Grande do Sul (2004)
Google Scholar
Wives, L.K., Loh, S.: Recuperação de informaçães usando a expansão semântica e a lógica difusa. In: Congreso Internacional de Ingeniería Informática, pp. 201–211. CITA, Faculdad de Ingenieria (1998)
Google Scholar
Wives, L.K., Loh, S., de Oliveira, J. P.M.: A comparative study of clustering versus classification over reuters collection. In: Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems, pp. 231–236 (2009)
Google Scholar
Wives, L.K., de Oliveira, J.P.M., Loh, S.: Conceptual clustering of textual documents and some insights for knowledge discovery. In: Prado, H.d., Ferneda, E. (eds.) Text Mining: Techniques and Applications, pp. 223–243. Information Science Reference, Hershey (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

PPGC, Instituto de Informática, UFRGS, Porto Alebre, Brazil
Ana Maria Schwendler Ramos, Vinicius Woloszyn & Leandro Krug Wives

Authors

Ana Maria Schwendler Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Vinicius Woloszyn
View author publications
You can also search for this author in PubMed Google Scholar
Leandro Krug Wives
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leandro Krug Wives .

Editor information

Editors and Affiliations

Universidad Autónoma de Occidente, Cali, Colombia
Andrés Solano
Universidad de San Buenaventura, Cali, Colombia
Hugo Ordoñez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ramos, A.M.S., Woloszyn, V., Wives, L.K. (2017). An Experimental Analysis of Feature Selection and Similarity Assessment for Textual Summarization. In: Solano, A., Ordoñez, H. (eds) Advances in Computing. CCC 2017. Communications in Computer and Information Science, vol 735. Springer, Cham. https://doi.org/10.1007/978-3-319-66562-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-66562-7_11
Published: 17 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66561-0
Online ISBN: 978-3-319-66562-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics