Skip to main content
Log in

Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs

  • Published:
Journal of Shanghai Jiaotong University (Science) Aims and scope Submit manuscript

Abstract

As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of documents. In this paper, we propose a sentence-word two layer graph algorithm combining with keyword density to generate the multi-document summarization, known as Graph & Keywordρ. The traditional graph methods of multi-document summarization only consider the influence of sentence and word in all documents rather than individual documents. Therefore, we construct multiple word graph and extract right keywords in each document to modify the sentence graph and to improve the significance and richness of the summary. Meanwhile, because of the differences in the words importance in documents, we propose to use keyword density for the summaries to provide rich content while using a small number of words. The experiment results show that the Graph & Keywordρ method outperforms the state of the art systems when tested on the Duc2004 data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. CHAO S, Tao L. Multi-document summarization via the minimum dominating set [C]//Proceedings of the 23rd International Conference on Computational Linguistics. Beijing: ACM, 2010: 984–992.

    Google Scholar 

  2. BHARTI S K, BABU K S, PRADHAN A. Automatic keyword extraction for text summarization in multidocument e-newspapers articles [J]. European Journal of Advances in Engineering and Technology, 2017, 4(6): 410–427.

    Google Scholar 

  3. MA L, HE T, LI F, et al. Query-focused multidocument summarization using keyword extraction [C]//Proceedings of 2008 International Conference on Computer Science and Software Engineering. Wuhan: IEEE, 2008: 20–23.

    Google Scholar 

  4. LITVAK M, LAST M. Graph-based keyword extraction for single-document summarization [C]//Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization. Manchester, UK: ACM, 2008: 17–24.

    Google Scholar 

  5. HONG K, CONROY J M, FAVRE B, et al. A repository of state of the art and competitive baseline summaries for generic news summarization [C]//Proceedings of the 9th International Conference on Language Resources and Evaluation. Reykjavik, Iceland: ELRA, 2014: 1608–1616.

    Google Scholar 

  6. RADEV D R, JING H, STYS M, et al. Centroid-based summarization of multiple documents [J]. Information Processing & Management, 2004, 40(6): 919–938.

    Article  MATH  Google Scholar 

  7. ERKAN G, RADEV D R. Lexrank: Graph-based lexical centrality as salience in text summarization [J]. Journal of Artificial Intelligence Research, 2004, 22(1): 457–479.

    Article  Google Scholar 

  8. WAN X, YANG J. Multi-document summarization using cluster-based link analysis [C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore: ACM, 2008: 299–306.

    Google Scholar 

  9. WAN X, YANG J, XIAO J. Manifold-ranking based topic-focused multi-document summarization [C]// Proceedings of the 20th International Joint Conference on Artifical Intelligence. Hyderabad, India: Morgan Kaufmann Publishers Inc., 2007: 2903–2908.

    Google Scholar 

  10. WAN X, XIAO J. Graph-based multi-modality learning for topic-focused multi-document summarization [C]//Proceedings of the 21th International Joint Conference on Artificial Intelligence. Pasadena, California, USA: Morgan Kaufmann Publishers Inc., 2009: 1586–1591.

    Google Scholar 

  11. CAO Z, LI W, LI S, et al. Improving multi-document summarization via text classification [C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, California, USA: AAAI, 2017: 3053–3059.

    Google Scholar 

  12. HADYAN F, SHAUFIAH BIJAKSANA M A. Comparison of document index graph using TextRank and HITS weighting method in automatic text summarization [J]. Journal of Physics: Conference Series, 2017, 801(1): 012076.

    Google Scholar 

  13. XIONG C, LI Y, LV K. Multi-documents summarization based on the TextRank and its application in argumentation system [C]//Proceedings of the 5th International Conference on Emerging Internetworking, Data & Web Technologies. Wuhan, China: Springer, 2017: 457–466.

    Google Scholar 

  14. YU S, SU J, LI P, et al. Towards high performance text mining: A TextRank-based method for automatic text summarization [J]. International Journal of Grid and High Performance Computing, 2016, 8(2): 58–75.

    Article  Google Scholar 

  15. BRITSOM D V, BRONSELAER A, TR´E G D. Using data merging techniques for generating multidocument summarizations [J]. IEEE Transactions on Fuzzy Systems, 2015, 23(3): 576–592.

    Article  Google Scholar 

  16. BARRIOS F, Ló PEZ F, ARGERICH L, et al. Variations of the similarity function of TextRank for automated summarization [EB/OL]. (2016-02-11). [2017-10-23]. https://arxio.org/pdf/1602.03606.pdf.

    Google Scholar 

  17. AL-HASHEMI R. Text summarization extraction system (TSES) Using extracted keywords [J]. International Arab Journal of E-Technology, 2010, 1(4): 164–168.

    Google Scholar 

  18. LIN C Y. ROUGE: A package for automatic evaluation of summaries [C]//Proceedings of Workshop on Text Summarization Branches Out. Barcelina, Spain: ACL, 2004.

    Google Scholar 

  19. WANG D, ZHU S, LI T, et al. Integrating document clustering and multidocument summarization [J]. ACM Transactions on Knowledge Discovery from Data, 2011, 5(3): 1–26.

    Article  MathSciNet  Google Scholar 

  20. KULESZA A, TASKAR B. Determinantal point processes for machine learning [J]. Foundations and Trends® in Machine Learning, 2012, 5(2/3): 123–286.

    Article  MATH  Google Scholar 

  21. DAVIS S T, CONROY J M, SCHLESINGER J D. OCCAMS —An optimal combinatorial covering algorithm for multi-document summarization [C]//Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops. Brussels, Belgium: IEEE, 2012: 454–463.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinchen Xu  (徐欣辰).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, F., Xu, X. Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs. J. Shanghai Jiaotong Univ. (Sci.) 23, 584–592 (2018). https://doi.org/10.1007/s12204-018-1957-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12204-018-1957-2

Key words

CLC number

Document code

Navigation