Optimizing word set coverage for multi-event summarization

Yan, Jihong; Cheng, Wenliang; Wang, Chengyu; Liu, Jun; Gao, Ming; Zhou, Aoying

doi:10.1007/s10878-015-9855-0

Optimizing word set coverage for multi-event summarization

Published: 14 March 2015

Volume 30, pages 996–1015, (2015)
Cite this article

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Jihong Yan^1,2,
Wenliang Cheng¹,
Chengyu Wang¹,
Jun Liu³,
Ming Gao¹ &
…
Aoying Zhou⁴

299 Accesses
6 Citations
Explore all metrics

Abstract

We have witnessed the proliferation of the Internet over the past few decades. A large amount of textual information is generated on the Web. It is impossible to locate and digest all the latest updates available on the Web for individuals. Text summarization would provide an efficient way to generate short, concise abstracts from the massive documents. These massive documents involve many events which are hard to be identified by the summarization procedure directly. We propose a novel methodology that identifies events from these text corpora and creates summarization for each event. We employ a probabilistic, topic model to learn the potential topics from the massive documents and further discover events in terms of the topic distributions of documents. To target the summarization, we define the word set coverage problem (WSCP) to capture the most representative sentences to summarize an event. For getting solution of the WSCP, we propose an approximate algorithm to solve the optimization problem. We conduct a set of experiments to evaluate our proposed approach on two real datasets: Sina news and Johnson & Johnson medical news. On both datasets, our proposed method outperforms competitive baselines by considering the harmonic mean of coverage and conciseness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Event Phase Extraction and Summarization

Topic-Focused Summarization of News Events Based on Biased Snippet Extraction and Selection

Identification of Event and Topic for Multi-document Summarization

Notes

http://www.sina.com.cn/.

References

Ablanedo-Rosas Rego (2010) Surrogate constraint normalization for the set covering problem. Eur J Oper Res 205:540–551
Article MATH MathSciNet Google Scholar
Alguliev RM, Aliguliyev RM, Hajirahimova MS, Mehdiyev CA (2011) Mcmr: maximum coverage and minimum redundant text summarization model. Expert Syst Appl 38:14514–14522
Article Google Scholar
Avella P, Boccia M, Vasilyev I (2009) Computational experience with general cutting planes for the set covering problem. Oper Res Lett 37:16–20
Article MATH MathSciNet Google Scholar
Balas Carrera (1996) A dynamic subgradient-based branch-and-bound procedure for set covering. Oper Res 44:875–890
Article MATH MathSciNet Google Scholar
Becker H, Naaman M, Gravano L (2010) Learning similarity metrics for event identification in social media. In: Proceedings of the third ACM international conference on Web search and data mining, ACM, pp 291–300
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Caprara A, Fischetti M, Toth P (1999) Aheuristic method for the set covering problem. Oper Res 47:730–743
Article MATH MathSciNet Google Scholar
Caragiannis I, Kaklamanis C, Kyropoulou M (2013) Tight approximation bounds for combinatorial frugal coverage algorithms. J Comb Optim 26:292–309
Article MATH MathSciNet Google Scholar
Chakrabarti D, Punera K (2011) Event summarization using tweets. In: ICWSM
Chieu HL, Ng HT (2002) A maximum entropy approach to information extraction from semi-structured and free text. In: Proceedings of the eighteenth national conference on artificial intelligence and fourteenth conference on innovative applications of artificial intelligence, Edmonton, Alberta, Canada. pp 786–791, 28 July–1 August 2002
Conroy JM, O’leary DP (2001) Text summarization via hidden markov models. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 406–407
Das D, Martins AF (2007) A survey on automatic text summarization. Lit Surv Lang Stat Course CMU 4:192–195
Google Scholar
Deng G, Lin W (2011) Ant colony optimization-based algorithm for airline crew scheduling problem. Expert Syst Appl 38:5787–579
Article MathSciNet Google Scholar
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
Google Scholar
Fattah MA, Ren F (2008) Automatic text summarization. World Acad Sci Eng Technol 37:2008
Google Scholar
Fisher Kan R (1988) The design, analysis and implementation of heuristics. Manag Sci 34:263–265
Article Google Scholar
Friedman JH (1997) On bias, variance, 0/1loss, and the curse-of-dimensionality. Data Min Knowl Discov 1:55–77
Article Google Scholar
García-Hernández RA, Ledeneva Y (2009) Word sequence models for single text summarization. In: Advances in computer-human interactions, 2009. Second International Conferences on ACHI’09, IEEE, pp 44–48
Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2:258–268
Google Scholar
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. Appl Stat 28:100–108
Article MATH Google Scholar
Kruengkrai C, Jaruskulchai C (2003) Generic text summarization using local and global properties of sentences In: Web intelligence, 2003. WI 2003. Proceedings. International Conference on IEEE/WIC, IEEE, pp 201–206
Kupiec J, Pedersen J, Chen F (1995) A trainable document summarizer. In: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 68–73
Kyoomarsi F, Khosravi H, Eslami E, Dehkordy PK, Tajoddin A (2008) Optimizing text summarization based on fuzzy logic. In: ACIS-ICIS, pp 347–352
Lin CY (1999) Training a selection function for extraction. In: Proceedings of the eighth international conference on information and knowledge management, ACM, pp 55–62
Lin J (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37:145–151
Article MATH Google Scholar
Radev DR, Hovy E, McKeown K (2002) Introduction to the special issue on summarization. Comput Linguist 28:399–408
Article Google Scholar
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World wide web, ACM, pp 851–860
Salton G, McGill M (1984) Introduction to modern information retrieval. McGraw-Hill Book Company, New York
Google Scholar
Svore KM, Vanderwende L, Burges CJC (2007) Enhancing single-document summarization by combining ranknet and third-party sources In EMNLP-CoNLL 2007, In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, Czech Republic, pp 448–457, 28–30 June 2007
Takamura H, Okumura M (2009) Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th conference of the european chapter of the association for computational linguistics, Association for Computational Linguistics, pp 781–789
Tsolmon B, Lee K (2014) An event extraction model based on timeline and user analysis in latent dirichlet allocation. In: The 37th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’14, Gold Coast, QLD, Australia, pp 1187–1190, 06–11 July 2014
Umetani, Yagiura (2007) Relaxation heuristics for the set covering problem. J Oper Res Soc Jpn 50:350–375
MATH MathSciNet Google Scholar
Yaghini M, Karimi M, Rahbar M (2013) A set covering approach for multi-depot train driver scheduling. J Comb Optim pp 1–19

Download references

Acknowledgments

This work is partially supported by the National Basic Research Program (973) of China (No. 2012CB316203) and NSFC under Grant Nos. 61402177, 61170838 and 61272036. The author would also like to thank Key Disciplines of Software Engineering of Shanghai Second Polytechnic University under Grant No. XXKZD1301 and Project of Shanghai Shen-kang Hospital Development Centre (No. 2014SKMR-04).

Author information

Authors and Affiliations

Institute for Data Science and Engineering, East China Normal University, Shanghai, 200062, China
Jihong Yan, Wenliang Cheng, Chengyu Wang & Ming Gao
Economic Management Institute, Shanghai Second Polytechnic University, Shanghai, 201209, China
Jihong Yan
Shanghai General Hospital, Shanghai Jiaotong University, Shanghai, 200080, China
Jun Liu
Shanghai Key Lab for Trustworthy Computing, East China Normal University, Shanghai, 200062, China
Aoying Zhou

Authors

Jihong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Wenliang Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Chengyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ming Gao
View author publications
You can also search for this author in PubMed Google Scholar
Aoying Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Gao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, J., Cheng, W., Wang, C. et al. Optimizing word set coverage for multi-event summarization. J Comb Optim 30, 996–1015 (2015). https://doi.org/10.1007/s10878-015-9855-0

Download citation

Published: 14 March 2015
Issue Date: November 2015
DOI: https://doi.org/10.1007/s10878-015-9855-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing word set coverage for multi-event summarization

Abstract

Access this article

Similar content being viewed by others

Event Phase Extraction and Summarization

Topic-Focused Summarization of News Events Based on Biased Snippet Extraction and Selection

Identification of Event and Topic for Multi-document Summarization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimizing word set coverage for multi-event summarization

Abstract

Access this article

Similar content being viewed by others

Event Phase Extraction and Summarization

Topic-Focused Summarization of News Events Based on Biased Snippet Extraction and Selection

Identification of Event and Topic for Multi-document Summarization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation