VSMCNN-dynamic summarization of videos using salient features from multi-CNN model

Nair, Madhu S.; Mohan, Jesna

doi:10.1007/s12652-022-04112-4

VSMCNN-dynamic summarization of videos using salient features from multi-CNN model

Original Research
Published: 25 June 2022

Volume 14, pages 14071–14080, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

298 Accesses
3 Citations
Explore all metrics

Abstract

A dynamic video summarization system detects key parts of the input video to generate its compact representation. The summaries can be used for efficient management of video data. This paper proposes an approach, Video summarization based on multi-CNN model (VSMCNN), that exploits major aspects of human cognition to generate meaningful summaries from videos. As the method focuses on dynamic summarization, the input video is divided into a set of shots. A multi-CNN model, which is a combination of different pre-trained models of CNN, is used for feature extraction from shots. The salient features are extracted from high dimensional feature vector using an unsupervised feature reduction technique applied in multiple subspaces to rank features in the vector. The distance measure between feature vectors is then thresholded to detect prime parts of the tested video. Experiments are performed on SumMe dataset and the results prove that our approach is successful in detecting portions of the tested video that has an essential message. The analysis shows that the method outperforms the state-of-the-art methods in the literature. Further evaluation on comparison with human-generated summaries in the ground truth proves the effectiveness of the proposed method. The paper also presents a detailed analysis to show which combination of pre-trained models of CNN is best suitable for generating dynamic summaries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Extraction Algorithm of Video Summary Based on Multi-feature Similarity

A bottom-up summarization algorithm for videos in the wild

Article Open access 26 February 2019

Static Video Summarization: A Comparative Study of Clustering-Based Techniques

References

Abdalla K, Menezes I, Oliveira L (2019) Modelling perceptions on the evaluation of video summarization. Expert Syst Appl 131:254–265
Article Google Scholar
Anuradha K, Anand V, Raajan NR (2020) An effective technique for the creation of a video synopsis. J Ambient Intell Humaniz Comput, pp 1–6
Bruhn A, Weickert J, Schnörr C (2005) Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int J Comput Vis 61(3):211–231
Article MATH Google Scholar
Cong Y, Liu J, Sun G, You Q, Li Y, Luo J (2016) Adaptive greedy dictionary selection for web media summarization. IEEE Trans Image Process 26(1):185–195
Article MathSciNet MATH Google Scholar
De Avila SEF, Lopes APB, da Luz Jr A, de Albuquerque Araújo A (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognit Lett 32(1):56–68
Article Google Scholar
Ejaz N, Mehmood I, Baik SW (2013) Efficient visual attention based framework for extracting key frames from videos. Signal Process Image Commun 28(1):34–44
Article Google Scholar
Elhamifar E, Clara De Paolis Kaluza M (2017) Online summarization via submodular and convex optimization. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1783–1791
Fei M, Jiang W, Mao W (2018) Creating memorable video summaries that satisfy the user’s intention for taking the videos. Neurocomputing 275:1911–1920
Article Google Scholar
Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Advances in neural information processing systems, pp 2069–2077
Guan G, Wang Z, Mei S, Ott M, He M, Feng DD (2014) A top-down approach for video summarization. ACM Trans Multimed Comput Commun Appl (TOMM) 11(1):1–21
Article Google Scholar
Guo Z, Gao L, Zhen X, Zou F, Shen F, Zheng K (2016) Spatial and temporal scoring for egocentric video summarization. Neurocomputing 208:299–308
Article Google Scholar
Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: European Conference on computer vision, pp 505–520. Springer
He X, Hua Y, Song T, Zhang Z, Xue Z, Ma R, Robertson N, Guan H (2019) Unsupervised video summarization with attentive conditional generative adversarial networks. In: Proceedings of the 27th ACM international conference on multimedia (MM’19). ACM, New York, NY, USA, pp 2296–2304
Google Scholar
Huang D, Cai X, Wang C-D (2019) Unsupervised feature selection with multi-subspace randomization and collaboration. Knowl-Based Syst 182:104856
Article Google Scholar
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360
Jadon S, Jasim M (2020) Unsupervised video summarization framework using keyframe extraction and video skimming. In: 2020 IEEE 5th International Conference on computing communication and automation (ICCCA), pp 140–145. IEEE
Jégou H, Douze M, Cordelia S, Patrick P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 3304–3311. IEEE
Ji Z, Zhao Y, Pang Y, Li X, Han J (2020) Deep attentive video summarization with distribution consistency learning. IEEE Trans Neural Netw Learn Syst 32(4):1765–1775
Article Google Scholar
Khosla A, Hamid R, Lin C, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: 2013 IEEE Conference on computer vision and pattern recognition, pp 2698–2705
Kuanar SK, Panda R, Chowdhury AS (2013) Video key frame extraction through dynamic Delaunay clustering with a structural constraint. J Vis Commun Image Represent 24(7):1212–1227
Article Google Scholar
Kumar M, Loui AC (2011) Key frame extraction from consumer videos using sparse representation. In: 2011 18th IEEE International Conference on image processing, pp 2437–2440. IEEE
Lal S, Duggal S, Sreedevi I (2019) Online video summarization: predicting future to better summarize present. In: 2019 IEEE Winter Conference on applications of computer vision (WACV), pp 471–480. IEEE
LeCun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (1989) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp 396–404
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Article Google Scholar
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp 1346–1353. IEEE
Li Y, Merialdo B (2010) Multi-video summarization based on video-mmr. In: 11th International Workshop on image analysis for multimedia interactive services WIAMIS 10, pp 1–4. IEEE
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Lu S, Wang Z, Mei T, Guan G, Feng DD (2014) A bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans Multimed 16(6):1497–1509
Article Google Scholar
Ma M, Mei S, Wan S, Hou J, Wang Z, Feng DD (2020) Video summarization via block sparse dictionary selection. Neurocomputing 378:197–209
Article Google Scholar
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR) 1:1–10
Mahmoud KM, Ismail MA, Ghanem NM. (2013) Vscan: an enhanced video summarization using density-based spatial clustering. In: International Conference on image analysis and processing, volume 8156, pp 733–742. Springer
Meng J, Wang H, Yuan J, Tan Y-P (2016) From keyframes to key objects: video summarization by representative object proposal selection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1039–1048
Mohan J, Nair M (2018) Dynamic summarization of videos based on descriptors in space-time video volumes and sparse autoencoder. IEEE Access 6:59768–59778
Article Google Scholar
Nair M, Mohan J (2019) Video summarization using convolutional neural network and random forest classifier. In: TENCON 2019-2019 IEEE Region 10 Conference (TENCON), pp 476–480. IEEE
Nair MS, Mohan J (2020) Domain-independent video summarization based on transfer learning using convolutional neural network. In: Advances in electrical and computer technologies, pp 435–452. Springer
Panda R, Roy-Chowdhury RK (2017) Collaborative summarization of topic-related videos. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 7083–7092
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: null, p 1470. IEEE
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning, AAAI, pp 4–12
Tiwari V, Bhatnagar C (2021) A survey of recent work on video summarization: approaches and techniques. Multimed Tools Appl 80(18):27187–27221
Article Google Scholar
Van den Bergh M, Boix X, Roig G, de Capitani B, Van Gool L (2012) Seeds: superpixels extracted via energy-driven sampling. In: European Conference on computer vision, pp 13–26. Springer
Wu J, Zhong S-h, Jiang J, Yang Y (2016) A novel clustering method for static video summarization. Multimed Tools Appl 76(260):1–17
Google Scholar
Yang H, Tian Q, Zhuang Q, Li L, Liang Q (2021) Fast and robust key frame extraction method for gesture video based on high-level feature representation. Signal, Image Video Process 15(3):617–626
Article Google Scholar
Yao T, Mei T, Rui Y (2016) Highlight detection with pairwise deep ranking for first-person video summarization. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 982–990
Zhang K, Chao W-L, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: European Conference on computer vision, pp 766–782. Springer
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 6848–6856
Zhu Y, Newsam S (2017) Densenet for dense flow. In: 2017 IEEE International Conference on image processing (ICIP), pp 790–794. IEEE

Download references

Acknowledgements

This work is supported by Cochin University of Science and Technology (CUSAT) through the Seed Money for New Research Initiatives (SMNRI) project (File No. PL.(UGC)1/SPG/SMNRI/2018-19 dated 14.11.2018).

Author information

Authors and Affiliations

Artificial Intelligence & Computer Vision Lab, Department of Computer Science, Cochin University of Science and Technology, Kochi, Kerala, 682022, India
Madhu S. Nair
Department of Computer Science and Engineering, Mar Baselios College of Engineering and Technology, Nalanchira, Thiruvananthapuram, Kerala, 695015, India
Jesna Mohan

Authors

Madhu S. Nair
View author publications
You can also search for this author in PubMed Google Scholar
Jesna Mohan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Madhu S. Nair.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nair, M.S., Mohan, J. VSMCNN-dynamic summarization of videos using salient features from multi-CNN model. J Ambient Intell Human Comput 14, 14071–14080 (2023). https://doi.org/10.1007/s12652-022-04112-4

Download citation

Received: 08 October 2020
Accepted: 06 June 2022
Published: 25 June 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s12652-022-04112-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VSMCNN-dynamic summarization of videos using salient features from multi-CNN model

Abstract

Access this article

Similar content being viewed by others

Hierarchical Extraction Algorithm of Video Summary Based on Multi-feature Similarity

A bottom-up summarization algorithm for videos in the wild

Static Video Summarization: A Comparative Study of Clustering-Based Techniques

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

VSMCNN-dynamic summarization of videos using salient features from multi-CNN model

Abstract

Access this article

Similar content being viewed by others

Hierarchical Extraction Algorithm of Video Summary Based on Multi-feature Similarity

A bottom-up summarization algorithm for videos in the wild

Static Video Summarization: A Comparative Study of Clustering-Based Techniques

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation