A Systematic Analysis of Sentence Update Detection for Temporal Summarization

Gârbacea, Cristina; Kanoulas, Evangelos

doi:10.1007/978-3-319-56608-5_33

Cristina Gârbacea²⁰ &
Evangelos Kanoulas²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10193))

Included in the following conference series:

European Conference on Information Retrieval

2488 Accesses

Abstract

Temporal summarization algorithms filter large volumes of streaming documents and emit sentences that constitute salient event updates. Systems developed typically combine in an ad-hoc fashion traditional retrieval and document summarization algorithms to filter sentences inside documents. Retrieval and summarization algorithms however have been developed to operate on static document collections. Therefore, a deep understanding of the limitations of these approaches when applied to a temporal summarization task is necessary. In this work we present a systematic analysis of the methods used for retrieval of update sentences in temporal summarization, and demonstrate the limitations and potentials of these methods by examining the retrievability and the centrality of event updates, as well as the existence of intrinsic inherent characteristics in update versus non-update sentences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
TREC TS focuses on large events with a wide impact, such as natural catastrophes (storms, earthquakes), conflicts (bombings, protests, riots, shootings) and accidents.
2.
In total we extract 8,471 unigrams and 1,169,276 bigrams using the log-likelihood ratio weighting scheme.
3.
We discard event types for which there is not enough annotated data available.
4.
http://trec-kba.org/kba-stream-corpus-2014.shtml.
5.
Word2Vec was trained on the set of gold standard updates from the TREC TS 2013 and TREC TS 2014 collections.
6.
No documents were released for event 7, hence the white row in the heatmap.
7.
For events 14, 21, 24 and 25 we cannot report on any centrality scores across relevant documents due to the size of the data and the inability of LexRank to handle it - hence the white rows in the heatmap in columns (B). The average values for the precision measures below the heatmap are computed excluding these events.

References

Allan, J.: HARD track overview in TREC 2003 high accuracy retrieval from documents. Technical report, DTIC Document (2005)
Google Scholar
Allan, J., Carbonell, J.G., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report (1998)
Google Scholar
Allan, J., Gupta, R., Khandelwal, V.: Topic models for summarizing novelty. In: ARDA Workshop on LMIR, Pennsylvania (2001)
Google Scholar
Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: Proceedings of the 21st ACM SIGIR Conference, pp. 37–45 (1998)
Google Scholar
Aslam, J.A., Diaz, F., Ekstrand-Abueg, M., McCreadie, R., Pavlu, V., Sakai, T.: TREC 2015 temporal summarization. In: Proceedings of the 24th TREC Conference 2015, Gaithersburg, MD, USA (2015)
Google Scholar
Chakrabarti, D., Punera, K.: Event summarization using Tweets. ICWSM 11, 66–73 (2011)
Google Scholar
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Google Scholar
Gao, W., Li, P., Darwish, K.: Joint topic modeling for event summarization across news and social media streams. In: Proceedings of the 21st ACM CIKM Conference, pp. 1173–1182. ACM (2012)
Google Scholar
Gupta, S., Nenkova, A., Jurafsky, D.: Measuring importance and query relevance in topic-focused multi-document summarization. In: Proceedings of the 45th ACL Interactive Poster and Demonstration Sessions, pp. 193–196. ACL (2007)
Google Scholar
Imran, M., Castillo, C., Diaz, F., Vieweg, S.: Processing social media messages in mass emergency: a survey. ACM Comput. Surv. (CSUR) 47(4), 67 (2015)
Article Google Scholar
Kamps, J., Pehcevski, J., Kazai, G., Lalmas, M., Robertson, S.: INEX 2007 evaluation measures. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 24–33. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85902-4_2
Chapter Google Scholar
Kedzie, C., McKeown, K., Diaz, F.: Predicting salient updates for disaster summarization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, ACL, pp. 1608–1617 (2015)
Google Scholar
McCreadie, R., Macdonald, C., Ounis, I.: Incremental update summarization: adaptive sentence selection based on prevalence and novelty. In: Proceedings of the 23rd ACM CIKM Conference, pp. 301–310. ACM (2014)
Google Scholar
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. ACL (2004)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in NIPS, pp. 3111–3119 (2013)
Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web (1999)
Google Scholar
Radev, D.R., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., Liu, D., et al.: Mead-a platform for multidocument multilingual text summarization. In: LREC (2004)
Google Scholar
Rayson, P., Garside, R.: Comparing corpora using frequency profiling. In: Proceedings of the Workshop on Comparing Corpora, pp. 1–6. ACL (2000)
Google Scholar
Vuurens, J.B.P., de Vries, A.P., Blanco, R., Mika, P.: Online news tracking for ad-hoc information needs. In: Proceedings of the 2015 lCTIR Conference, MA, USA, 27–30 September 2015, pp. 221–230 (2015)
Google Scholar

Download references

Acknowledgements

This research was supported by the Dutch national program COMMIT. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.

Author information

Authors and Affiliations

University of Michigan, Ann Arbor, Michigan, USA
Cristina Gârbacea
University of Amsterdam, Amsterdam, The Netherlands
Evangelos Kanoulas

Authors

Cristina Gârbacea
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos Kanoulas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evangelos Kanoulas .

Editor information

Editors and Affiliations

University of Glasgow , Glasgow, United Kingdom
Joemon M Jose
TU Delft - EWI/ST/WIS , Delft, The Netherlands
Claudia Hauff
Middle East Technical University , Ankara, Turkey
Ismail Sengor Altıngovde
Open University , Milton Keynes, United Kingdom
Dawei Song
Signal Media , London, United Kingdom
Dyaa Albakour
Toronto, Canada
Stuart Watt
JohnTait.net Ltd. and BCS IRSG , Sunderland, United Kingdom
John Tait

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gârbacea, C., Kanoulas, E. (2017). A Systematic Analysis of Sentence Update Detection for Temporal Summarization. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-56608-5_33
Published: 08 April 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56607-8
Online ISBN: 978-3-319-56608-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics