Skip to main content

EM Clustering Algorithm for Automatic Text Summarization

  • Conference paper
Advances in Artificial Intelligence (MICAI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7094))

Included in the following conference series:

Abstract

Automatic text summarization has emerged as a technique for accessing only to useful information. In order to known the quality of the automatic summaries produced by a system, in DUC 2002 (Document Understanding Conference) has developed a standard human summaries called gold collection of 567 documents of single news. In this conference only five systems could outperforms the baseline heuristic in single extractive summarization task. So far, some approaches have got good results combining different strategies with language-dependent knowledge. In this paper, we present a competitive method based on an EM clustering algorithm for improving the quality of the automatic summaries using practically non language-dependent knowledge. Also, a comparison of this method with three text models is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Montiel, R., et al.: Comparación de Tres Modelos de Texto para la Generación Automática de Resúmenes. Natural Language Processing Journal of Spain Society 43, 303–311 (2009)

    Google Scholar 

  2. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier (2005) ISBN: 0-12-088407-0

    Google Scholar 

  3. NetCraft, June 2011 Web Server Survey, England (2011), http://news.netcraft.com/archives/-web_server_survey.html

  4. Lee, J.-H., Park, S., Ahn, C.-M., Kim, D.: Automatic Generic Document Summarization Based on Non-negative Matrix Factorization. In: Information Processing and Management, vol. 45, pp. 20–34. Elsevier (2009) ISSN 0306-4573

    Google Scholar 

  5. García, R., Ledeneva, Y., Gelbukh, A.: Keeping Maximal Frequent Sequences Facilitates Extractive Summarization. Research in Computing Science 28 (2008) ISSN 1870-4069

    Google Scholar 

  6. García-Hernández, R.A., Montiel, R., Ledeneva, Y., Rendón, E., Gelbukh, A., Cruz, R.: Text Summarization by Sentence Extraction Using Unsupervised Learning. In: Gelbukh, A., Morales, E.F. (eds.) MICAI 2008. LNCS (LNAI), vol. 5317, pp. 133–143. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Porter, M.: The Porter Stemming Algorithm. Official home page for distribution of the Porter Stemming Algorithm (2006), http://tartarus.org/~martin/PorterStemmer/index.html

  8. Luhn, H.P.: The automatic creation of Literature abstracts. IBM Journal of Research and Development (1958)

    Google Scholar 

  9. Edmondson, H.P.: New Methods in Automatic Extraction. Journal of the Association for Computing Machinery (1969)

    Google Scholar 

  10. Brandow, R., Mitze, K., Rau, L.: Automatic condensation of Electronic publication by sentence selection. In: Information Proc. and Management (1995)

    Google Scholar 

  11. Kupiec, J., Pedersen, J.O., Chen, F.: A trainable document summarizer. In: SIGIR 1995, New York (1995)

    Google Scholar 

  12. Goldstein, J., Carbolell, J., Kantrowitz, M., Mittal, V.: Summarizating text documents: sentence and evaluation metrics. In: 22nd Int. ACM SIGIR Research and Development in Information Retrieval, Berkley (1999)

    Google Scholar 

  13. Marcus, D.: The rethorical parsing, summarization, and generation of natural language text, PhD. Thesis, Dep. of Computer Science, University of Toronto (1998)

    Google Scholar 

  14. Marcus, D.: The Theory and practice of Discourse Parsing summarization. Institute of technology, Massachusetts (2000)

    Google Scholar 

  15. Yeh, J.Y., Ke, H.R., Yang, W.P., Meng, J.H.: Text summarization using a trainable summarizer and latent semantic analysis. Information Processing and Management (2005)

    Google Scholar 

  16. Shen, D., Sun, J.T., Li, H., Yang, Q., Chen, Z.: Document summarization using conditional random fields. In: Proc. IJCAI 2007 (2007)

    Google Scholar 

  17. da Cunha, I., Fernández, S., Velázquez Morales, P., Vivaldi, J., SanJuan, E., Torres-Moreno, J.-M.: A New Hybrid Summarizer Based on Vector Space Model, Statistical Physics and Linguistics. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 872–882. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Radev, R., Jing, H., Stys, M., Tam, D.: Centroid-based summarization for multiple documents. 1st Int. Journal Information Processing and Management (2004)

    Google Scholar 

  19. García, R., Ledeneva, Y., Gelbukh, A., Gutierrez, C.: An assessment of Word Sequence Models for Extractive Text Summarization. Research in Computing Science (38), 253–262 (2008)

    Google Scholar 

  20. García, R., Ledeneva, Y., Mendoza, G., Hernandez, A., Chavez, J., Gelbukh, A., Tapia, L.: Comparing commercial tools and state-of-the-art methods for generating text summaries. In: Eighth Mexican International Conference on Artificial Intelligence, México, pp. 92–96 (2009)

    Google Scholar 

  21. Ledeneva, Y., Sidorov, G.: Recent advances in Computational Linguistics. Informatica, International Journal of Computing and Informatics 3871(34), 3–18 (2010) ISSN: 1854-3871

    Google Scholar 

  22. Ledeneva, Y., García, R., Gelbukh, A.: Multi-document summarization using Maximal Frequent Sequences. Research in Computer Science 47 (2010) ISSN 1870-4069

    Google Scholar 

  23. Lin, C.: ROUGE: A package for automatic evaluation of summaries. In: Proceedings of the Association for Computational Linguistics, Workshop, on Text Summarization, pp. 74–81 (2004)

    Google Scholar 

  24. Lin, C., Hovy, E.: Manual and Automatic evaluation of summaries, In: Proceedings of the Workshop on Automatic Summarization (including DUC 2002), vol. I, pp. 71–78. Association for Computational Linguistics on Human Language Technology (2002)

    Google Scholar 

  25. DUC Document Understanding Conference 2002 (2002), http://www-nlpir.nist.gov/proyect/duc

  26. Garcia, R., Martinez, F., Carrasco, A.: Finding maximal sequential patterns in text document collections and single documents. Informatica, International Journal of Computing and Informatics (34), 93–101 (2010) ISSN: 1854-3871

    Google Scholar 

  27. Lin, C.Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics. In: Proceedings of HLT-NAACL, Canada (2003)

    Google Scholar 

  28. Mihalcea, R.: Graph-based Ranking Algorithms for Sentence Extraction. In: Applied to Text Summarization. University of North Texas, Texas (2004)

    Google Scholar 

  29. Ledeneva, Y.: Automatic Language-Independent Detection of Multiword Descriptions for Text Summarization, National Polytechnic Institute, PhD. Thesis, Mexico (2009)

    Google Scholar 

  30. Sidorov, G.: Lemmatization in automatized system for compilation of personal style dictionaries of literary writers. “Word of Dostoyevsky”, Russian Academy of Sciences, 266–300 (1996)

    Google Scholar 

  31. Gelbukh, A., Sidorov, G.: Approach to Construction of Automatic Morphological Analysis Systems for Inflective Languages with Little Effort. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 215–220. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  32. Sidorov, G., Barrón-Cedeño, A., Rosso, P.: English-Spanish Large Statistical Dictionary of Inflectional Forms. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta, pp. 277–281. European Language Resources Association (ELRA) (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ledeneva, Y., Hernández, R.G., Soto, R.M., Reyes, R.C., Gelbukh, A. (2011). EM Clustering Algorithm for Automatic Text Summarization. In: Batyrshin, I., Sidorov, G. (eds) Advances in Artificial Intelligence. MICAI 2011. Lecture Notes in Computer Science(), vol 7094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25324-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25324-9_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25323-2

  • Online ISBN: 978-3-642-25324-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics