Leveraging semantic saliency maps for query-specific video summarization

Cizmeciler, Kemal; Erdem, Erkut; Erdem, Aykut

doi:10.1007/s11042-022-12442-w

Leveraging semantic saliency maps for query-specific video summarization

Published: 07 March 2022

Volume 81, pages 17457–17482, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

512 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

The immense amount of videos being uploaded to video sharing platforms makes it impossible for a person to watch all the videos understand what happens in them. Hence, machine learning techniques are now deployed to index videos by recognizing key objects, actions and scenes or places. Summarization is another alternative as it offers to extract only important parts while covering the gist of the video content. Ideally, the user may prefer to analyze a certain action or scene by searching a query term within the video. Current summarization methods generally do not take queries into account or require exhaustive data labeling. In this work, we present a weakly supervised query-focused video summarization method. Our proposed approach makes use of semantic attributes as an indicator of query relevance and semantic attention maps to locate related regions in the frames and utilizes both within a submodular maximization framework. We conducted experiments on the recently introduced RAD dataset and obtained highly competitive results. Moreover, to better evaluate the performance of our approach on longer videos, we collected a new dataset, which consists of 10 videos from YouTube and annotated with shot-level multiple attributes. Our dataset enables much diverse set of queries that can be used to summarize a video from different perspectives with more degrees of freedom.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Watch Hours in Minutes: Summarizing Videos with User Intent

Transforming Multi-concept Attention into Video Summarization

Multi video summarization using query based deep optimization algorithm

Article 18 May 2023

Notes

https://hucvl.github.io/query-specific-summarization/.
For additional qualitative results, please refer to the project website at https://hucvl.github.io/query-specific-summarization.

References

Basavarajaiah M, Sharma P (2021) GVSUM: Generic Video summarization using deep visual features. Multimed Tools Appl 80:14459–14476
Article Google Scholar
de Avila SEF, Lopes APB, da Luz A, de Albuquerque Araújo A (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68
Article Google Scholar
Dosovitskiy A, Fischer P, Ilg E, Hausser P, Hazirbas C, Golkov V, Van Der Smagt P, Cremers D, Brox T (2015) Flownet: Learning optical flow with convolutional networks. In: Proceedings IEEE international conference on computer vision (ICCV), pp 2758–2766
Goldman DB, Curless B, Salesin D, Seitz SM (2006) Schematic storyboarding for video visualization and editing. In: ACM Transactions on graphics (TOG), vol 25. ACM, pp 862–871
Gong B, Chao W-L, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Proceedings advances in neural information processing systems (neurIPS), pp 2069–2077
Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: Proceeding sEuropean conference on computer vision (ECCV). Springer, pp 505–520
Gygli M, Grabner H, Van Gool L (2015) Video summarization by learning submodular mixtures of objectives. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR), pp 3090–3098
Iyer R, Dubal P, Dargan K, Kothawade S, Mahadev R, Kaushal V (2018) Vis-dss: An open-source toolkit for visual data selection and summarization. arXiv:1809.08846
Jiang P, Han Y (2019) Query-conditioned three-player adversarial network for video summarization. In: Proceeding international conference on multimedia retrieval (ICMR)
Kaushal V, Kothawade S, Tomar A, Iyer20218 R, Ramakrishnan G (2021) How good is a video summary? a new benchmarking dataset and evaluation framework towards realistic video summarization. arXiv:2101.10514
Khosla A, Hamid R, Lin C-J, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR), pp 2698–2705
Kim G, Sigal L, Xing EP (2014) Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR), pp 4225–4232
Laganière R, Bacco R, Hocevar A, Lambert P, Païs G, Ionescu BE (2008) Video summarization from spatio-temporal features. In: Proc. ACM TRECVid video summarization workshop. ACM, pp 144–148
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proceedings IEEE computer vision and pattern recognition (CVPR). IEEE, pp 1346–1353
Lee YJ, Grauman K (2015) Predicting important objects for egocentric video summarization. Int J Comput Vis 114(1):38–55
Article MathSciNet Google Scholar
Li Y, Merialdo B (2010) VERT: Automatic evaluation of video summaries. In: Proceedings ACM Multimedia. ACM, pp 851–854
Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, vol 1. Association for Computational Linguistics, pp 510–520
Liu W, Mei T, Zhang Y, Che C, Luo J (2015) Multi-task deep visual-semantic embedding for video thumbnail selection. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR), pp 3707–3715
Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR), pp 2714–2721
Mendi E, Clemente HB, Bayrak C (2013) Sports video summarization based on motion analysis. Comput Electr Eng 39(3):790–796
Article Google Scholar
Monfort M, Andonian A, Zhou B, Ramakrishnan K, Bargal SA, Yan T, Brown L, Fan Q, Gutfruend D, Vondrick C et al (2019) Moments in time dataset: one million videos for event understanding. IEEE Trans Pattern Anal Mach Intell, 1–8
Mundnich K, Fenster A, Khare A, Sundaram S (2021) Audiovisual highlight detection in videos. In: Proceedings IEEE ICASSP
Ngo C-W, Ma Y-F, Zhang H-J (2005) Video summarization and scene detection by graph modeling. IEEE Trans Circuits Syst Video Technol 15 (2):296–305
Article Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42:145–175
Article Google Scholar
Otani M, Nakashima Y, Rahtu E, Heikkilä J (2019) Rethinking the evaluation of video summaries. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR)
Otani M, Nakashima Y, Rahtu E, Heikkilä J, Yokoya N (2016) Video summarization using deep semantic features. In: Asian conference on computer vision
Panda R, Das A, Wu Z, Ernst J, Roy-Chowdhury AK (2017) Weakly supervised summarization of web videos. In: Proceedings IEEE international conference on computer vision (ICCV), pp 3657–3666
Pantazis G, Dimas G, Iakovidis, Salsum DK (2020) Saliency-based video summarization using generative adversarial networks. arXiv:2011.10432
Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: Proceedings European conference on computer vision (ECCV)
Rapantzikos K, Evangelopoulos G, Maragos P, Avrithis Y (2007) An audio-visual saliency model for movie summarization. In: Multimedia signal processing, 2007. MMSP 2007. IEEE 9th workshop on. IEEE, pp 320–323
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings IEEE international conference on computer vision (ICCV), pp 618–626
Shao J, Kang K, Loy CC, Wang X (2015) Deeply learned attributes for crowded scene understanding. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR), pp 4657–4666
Sharghi A, Borji A, Li C, Yang T (2018) Improving sequential determinantal point processes for supervised video summarization. In: Proceedings European conference on computer vision (ECCV). Springer
Sharghi A, Gong B, Shah M (2016) Query-focused extractive video summarization. In: Proceedings European conference on computer vision (ECCV). Springer, pp 3–19
Sharghi A, Laurel JS, Gong B (2017) Query-focused video summarization: dataset, evaluation, and a memory network based approach. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR)
Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: Proceedings International conference on machine learning (ICML)
Simonyan K, Vedaldi A, Zisserman A (2014) Deep inside convolutional networks: Visualising image classification models and saliency maps. In: Proceedings International conference on learning representations (ICLR)
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2015) Striving for simplicity: The all convolutional net. In: Proceedings International conference on learning representations (ICLR). Workshop Track
Sun K, Zhu J, Lei Z, Hou X, Zhang Q, Duan J, Qiu G (2017) Learning deep semantic attributes for user video summarization. In: Proceedings IEEE international conference on multimedia and expo (ICME). IEEE, pp 643–648
Tiwari V, Bhatnagar C (2021) A survey of recent work on video summarization: Approaches and techniques. Multimedia Tools and Applications
Vasudevan AB, Gygli M, Volokitin A, Van Gool L (2017) Query-adaptive video summarization via quality-aware relevance estimation. In: Proceedings ACM multimedia. ACM, pp 582–590
Wolf W (1996) Key frame selection by motion analysis. In: Proc. IEEE international conference on acoustics, speech, and signal processing, vol 2. IEEE, pp 1228–1231
Xiong B, Grauman K (2014) Detecting snap points in egocentric video with a web photo prior. In: Proceedings european conference on computer vision (ECCV). Springer, pp 282–298
Xu J, Mukherjee L, Li Y, Warner J, Rehg JM, Singh V (2015) Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR), pp 2235–2244
Yeung S, Fathi A, Fei-Fei L (2014) Videoset: Video summary evaluation through text. arXiv:1406.5824
Zeiler MD, Fergus R (2014) Visualizing and Understanding Convolutional Networks. Springer International Publishing, Cham, pp 818–833
Google Scholar
Zhang J, Zhe L, Brandt J, Shen X, Stan S (2016) Top-down neural attention by excitation backprop. In: Proceedings European conference on computer vision(ECCV)
Zhang Y, Kampffmeyer M, Liang X, Tan M, Xing EP (2018) Hierarchical variational network for user-diversified i& query-focused video summarization. In: Proceedings British machine vision conference (BMVC). BMVA
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings IEEE conference on computer vision and pattern recognition (CVPR), pp 2921–2929

Download references

Acknowledgments

This work was supported in part by GEBIP 2018 Award of the Turkish Academy of Sciences to E. Erdem, BAGEP 2021 Award of the Science Academy to A. Erdem.

Author information

Authors and Affiliations

Department of Computer Engineering, Hacettepe University, Ankara, Turkey
Kemal Cizmeciler & Erkut Erdem
Department of Computer Engineering, Koç University, Istanbul, Turkey
Aykut Erdem

Authors

Kemal Cizmeciler
View author publications
You can also search for this author in PubMed Google Scholar
Erkut Erdem
View author publications
You can also search for this author in PubMed Google Scholar
Aykut Erdem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erkut Erdem.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cizmeciler, K., Erdem, E. & Erdem, A. Leveraging semantic saliency maps for query-specific video summarization. Multimed Tools Appl 81, 17457–17482 (2022). https://doi.org/10.1007/s11042-022-12442-w

Download citation

Received: 20 April 2021
Revised: 12 July 2021
Accepted: 25 January 2022
Published: 07 March 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11042-022-12442-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging semantic saliency maps for query-specific video summarization

Abstract

Access this article

Similar content being viewed by others

Watch Hours in Minutes: Summarizing Videos with User Intent

Transforming Multi-concept Attention into Video Summarization

Multi video summarization using query based deep optimization algorithm

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Leveraging semantic saliency maps for query-specific video summarization

Abstract

Access this article

Similar content being viewed by others

Watch Hours in Minutes: Summarizing Videos with User Intent

Transforming Multi-concept Attention into Video Summarization

Multi video summarization using query based deep optimization algorithm

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation