Abstract
A dynamic video summarization system detects key parts of the input video to generate its compact representation. The summaries can be used for efficient management of video data. This paper proposes an approach, Video summarization based on multi-CNN model (VSMCNN), that exploits major aspects of human cognition to generate meaningful summaries from videos. As the method focuses on dynamic summarization, the input video is divided into a set of shots. A multi-CNN model, which is a combination of different pre-trained models of CNN, is used for feature extraction from shots. The salient features are extracted from high dimensional feature vector using an unsupervised feature reduction technique applied in multiple subspaces to rank features in the vector. The distance measure between feature vectors is then thresholded to detect prime parts of the tested video. Experiments are performed on SumMe dataset and the results prove that our approach is successful in detecting portions of the tested video that has an essential message. The analysis shows that the method outperforms the state-of-the-art methods in the literature. Further evaluation on comparison with human-generated summaries in the ground truth proves the effectiveness of the proposed method. The paper also presents a detailed analysis to show which combination of pre-trained models of CNN is best suitable for generating dynamic summaries.
Similar content being viewed by others
References
Abdalla K, Menezes I, Oliveira L (2019) Modelling perceptions on the evaluation of video summarization. Expert Syst Appl 131:254–265
Anuradha K, Anand V, Raajan NR (2020) An effective technique for the creation of a video synopsis. J Ambient Intell Humaniz Comput, pp 1–6
Bruhn A, Weickert J, Schnörr C (2005) Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int J Comput Vis 61(3):211–231
Cong Y, Liu J, Sun G, You Q, Li Y, Luo J (2016) Adaptive greedy dictionary selection for web media summarization. IEEE Trans Image Process 26(1):185–195
De Avila SEF, Lopes APB, da Luz Jr A, de Albuquerque Araújo A (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognit Lett 32(1):56–68
Ejaz N, Mehmood I, Baik SW (2013) Efficient visual attention based framework for extracting key frames from videos. Signal Process Image Commun 28(1):34–44
Elhamifar E, Clara De Paolis Kaluza M (2017) Online summarization via submodular and convex optimization. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1783–1791
Fei M, Jiang W, Mao W (2018) Creating memorable video summaries that satisfy the user’s intention for taking the videos. Neurocomputing 275:1911–1920
Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Advances in neural information processing systems, pp 2069–2077
Guan G, Wang Z, Mei S, Ott M, He M, Feng DD (2014) A top-down approach for video summarization. ACM Trans Multimed Comput Commun Appl (TOMM) 11(1):1–21
Guo Z, Gao L, Zhen X, Zou F, Shen F, Zheng K (2016) Spatial and temporal scoring for egocentric video summarization. Neurocomputing 208:299–308
Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: European Conference on computer vision, pp 505–520. Springer
He X, Hua Y, Song T, Zhang Z, Xue Z, Ma R, Robertson N, Guan H (2019) Unsupervised video summarization with attentive conditional generative adversarial networks. In: Proceedings of the 27th ACM international conference on multimedia (MM’19). ACM, New York, NY, USA, pp 2296–2304
Huang D, Cai X, Wang C-D (2019) Unsupervised feature selection with multi-subspace randomization and collaboration. Knowl-Based Syst 182:104856
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360
Jadon S, Jasim M (2020) Unsupervised video summarization framework using keyframe extraction and video skimming. In: 2020 IEEE 5th International Conference on computing communication and automation (ICCCA), pp 140–145. IEEE
Jégou H, Douze M, Cordelia S, Patrick P (2010) Aggregating local descriptors into a compact image representation. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 3304–3311. IEEE
Ji Z, Zhao Y, Pang Y, Li X, Han J (2020) Deep attentive video summarization with distribution consistency learning. IEEE Trans Neural Netw Learn Syst 32(4):1765–1775
Khosla A, Hamid R, Lin C, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: 2013 IEEE Conference on computer vision and pattern recognition, pp 2698–2705
Kuanar SK, Panda R, Chowdhury AS (2013) Video key frame extraction through dynamic Delaunay clustering with a structural constraint. J Vis Commun Image Represent 24(7):1212–1227
Kumar M, Loui AC (2011) Key frame extraction from consumer videos using sparse representation. In: 2011 18th IEEE International Conference on image processing, pp 2437–2440. IEEE
Lal S, Duggal S, Sreedevi I (2019) Online video summarization: predicting future to better summarize present. In: 2019 IEEE Winter Conference on applications of computer vision (WACV), pp 471–480. IEEE
LeCun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (1989) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp 396–404
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp 1346–1353. IEEE
Li Y, Merialdo B (2010) Multi-video summarization based on video-mmr. In: 11th International Workshop on image analysis for multimedia interactive services WIAMIS 10, pp 1–4. IEEE
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Lu S, Wang Z, Mei T, Guan G, Feng DD (2014) A bag-of-importance model with locality-constrained coding based feature learning for video summarization. IEEE Trans Multimed 16(6):1497–1509
Ma M, Mei S, Wan S, Hou J, Wang Z, Feng DD (2020) Video summarization via block sparse dictionary selection. Neurocomputing 378:197–209
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial LSTM networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR) 1:1–10
Mahmoud KM, Ismail MA, Ghanem NM. (2013) Vscan: an enhanced video summarization using density-based spatial clustering. In: International Conference on image analysis and processing, volume 8156, pp 733–742. Springer
Meng J, Wang H, Yuan J, Tan Y-P (2016) From keyframes to key objects: video summarization by representative object proposal selection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1039–1048
Mohan J, Nair M (2018) Dynamic summarization of videos based on descriptors in space-time video volumes and sparse autoencoder. IEEE Access 6:59768–59778
Nair M, Mohan J (2019) Video summarization using convolutional neural network and random forest classifier. In: TENCON 2019-2019 IEEE Region 10 Conference (TENCON), pp 476–480. IEEE
Nair MS, Mohan J (2020) Domain-independent video summarization based on transfer learning using convolutional neural network. In: Advances in electrical and computer technologies, pp 435–452. Springer
Panda R, Roy-Chowdhury RK (2017) Collaborative summarization of topic-related videos. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 7083–7092
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: null, p 1470. IEEE
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning, AAAI, pp 4–12
Tiwari V, Bhatnagar C (2021) A survey of recent work on video summarization: approaches and techniques. Multimed Tools Appl 80(18):27187–27221
Van den Bergh M, Boix X, Roig G, de Capitani B, Van Gool L (2012) Seeds: superpixels extracted via energy-driven sampling. In: European Conference on computer vision, pp 13–26. Springer
Wu J, Zhong S-h, Jiang J, Yang Y (2016) A novel clustering method for static video summarization. Multimed Tools Appl 76(260):1–17
Yang H, Tian Q, Zhuang Q, Li L, Liang Q (2021) Fast and robust key frame extraction method for gesture video based on high-level feature representation. Signal, Image Video Process 15(3):617–626
Yao T, Mei T, Rui Y (2016) Highlight detection with pairwise deep ranking for first-person video summarization. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 982–990
Zhang K, Chao W-L, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: European Conference on computer vision, pp 766–782. Springer
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 6848–6856
Zhu Y, Newsam S (2017) Densenet for dense flow. In: 2017 IEEE International Conference on image processing (ICIP), pp 790–794. IEEE
Acknowledgements
This work is supported by Cochin University of Science and Technology (CUSAT) through the Seed Money for New Research Initiatives (SMNRI) project (File No. PL.(UGC)1/SPG/SMNRI/2018-19 dated 14.11.2018).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nair, M.S., Mohan, J. VSMCNN-dynamic summarization of videos using salient features from multi-CNN model. J Ambient Intell Human Comput 14, 14071–14080 (2023). https://doi.org/10.1007/s12652-022-04112-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-04112-4