Abstract
As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of documents. In this paper, we propose a sentence-word two layer graph algorithm combining with keyword density to generate the multi-document summarization, known as Graph & Keywordρ. The traditional graph methods of multi-document summarization only consider the influence of sentence and word in all documents rather than individual documents. Therefore, we construct multiple word graph and extract right keywords in each document to modify the sentence graph and to improve the significance and richness of the summary. Meanwhile, because of the differences in the words importance in documents, we propose to use keyword density for the summaries to provide rich content while using a small number of words. The experiment results show that the Graph & Keywordρ method outperforms the state of the art systems when tested on the Duc2004 data set.
Similar content being viewed by others
References
CHAO S, Tao L. Multi-document summarization via the minimum dominating set [C]//Proceedings of the 23rd International Conference on Computational Linguistics. Beijing: ACM, 2010: 984–992.
BHARTI S K, BABU K S, PRADHAN A. Automatic keyword extraction for text summarization in multidocument e-newspapers articles [J]. European Journal of Advances in Engineering and Technology, 2017, 4(6): 410–427.
MA L, HE T, LI F, et al. Query-focused multidocument summarization using keyword extraction [C]//Proceedings of 2008 International Conference on Computer Science and Software Engineering. Wuhan: IEEE, 2008: 20–23.
LITVAK M, LAST M. Graph-based keyword extraction for single-document summarization [C]//Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization. Manchester, UK: ACM, 2008: 17–24.
HONG K, CONROY J M, FAVRE B, et al. A repository of state of the art and competitive baseline summaries for generic news summarization [C]//Proceedings of the 9th International Conference on Language Resources and Evaluation. Reykjavik, Iceland: ELRA, 2014: 1608–1616.
RADEV D R, JING H, STYS M, et al. Centroid-based summarization of multiple documents [J]. Information Processing & Management, 2004, 40(6): 919–938.
ERKAN G, RADEV D R. Lexrank: Graph-based lexical centrality as salience in text summarization [J]. Journal of Artificial Intelligence Research, 2004, 22(1): 457–479.
WAN X, YANG J. Multi-document summarization using cluster-based link analysis [C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Singapore: ACM, 2008: 299–306.
WAN X, YANG J, XIAO J. Manifold-ranking based topic-focused multi-document summarization [C]// Proceedings of the 20th International Joint Conference on Artifical Intelligence. Hyderabad, India: Morgan Kaufmann Publishers Inc., 2007: 2903–2908.
WAN X, XIAO J. Graph-based multi-modality learning for topic-focused multi-document summarization [C]//Proceedings of the 21th International Joint Conference on Artificial Intelligence. Pasadena, California, USA: Morgan Kaufmann Publishers Inc., 2009: 1586–1591.
CAO Z, LI W, LI S, et al. Improving multi-document summarization via text classification [C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, California, USA: AAAI, 2017: 3053–3059.
HADYAN F, SHAUFIAH BIJAKSANA M A. Comparison of document index graph using TextRank and HITS weighting method in automatic text summarization [J]. Journal of Physics: Conference Series, 2017, 801(1): 012076.
XIONG C, LI Y, LV K. Multi-documents summarization based on the TextRank and its application in argumentation system [C]//Proceedings of the 5th International Conference on Emerging Internetworking, Data & Web Technologies. Wuhan, China: Springer, 2017: 457–466.
YU S, SU J, LI P, et al. Towards high performance text mining: A TextRank-based method for automatic text summarization [J]. International Journal of Grid and High Performance Computing, 2016, 8(2): 58–75.
BRITSOM D V, BRONSELAER A, TR´E G D. Using data merging techniques for generating multidocument summarizations [J]. IEEE Transactions on Fuzzy Systems, 2015, 23(3): 576–592.
BARRIOS F, Ló PEZ F, ARGERICH L, et al. Variations of the similarity function of TextRank for automated summarization [EB/OL]. (2016-02-11). [2017-10-23]. https://arxio.org/pdf/1602.03606.pdf.
AL-HASHEMI R. Text summarization extraction system (TSES) Using extracted keywords [J]. International Arab Journal of E-Technology, 2010, 1(4): 164–168.
LIN C Y. ROUGE: A package for automatic evaluation of summaries [C]//Proceedings of Workshop on Text Summarization Branches Out. Barcelina, Spain: ACL, 2004.
WANG D, ZHU S, LI T, et al. Integrating document clustering and multidocument summarization [J]. ACM Transactions on Knowledge Discovery from Data, 2011, 5(3): 1–26.
KULESZA A, TASKAR B. Determinantal point processes for machine learning [J]. Foundations and Trends® in Machine Learning, 2012, 5(2/3): 123–286.
DAVIS S T, CONROY J M, SCHLESINGER J D. OCCAMS —An optimal combinatorial covering algorithm for multi-document summarization [C]//Proceedings of the 2012 IEEE 12th International Conference on Data Mining Workshops. Brussels, Belgium: IEEE, 2012: 454–463.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ye, F., Xu, X. Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs. J. Shanghai Jiaotong Univ. (Sci.) 23, 584–592 (2018). https://doi.org/10.1007/s12204-018-1957-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12204-018-1957-2