Content modification of soccer videos using a supervised deep learning framework

Ghassab, Vahid Khorasani; Maanicshah, Kamal; Green, Paul; Bouguila, Nizar

doi:10.1007/s11042-021-11383-0

Content modification of soccer videos using a supervised deep learning framework

Published: 13 September 2021

Volume 81, pages 481–503, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Vahid Khorasani Ghassab ORCID: orcid.org/0000-0001-7697-9364¹,
Kamal Maanicshah¹,
Paul Green² &
…
Nizar Bouguila¹

445 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

In this paper, we initially propose a novel framework for replacing advertisement contents in soccer videos in an automatic way by using deep learning strategies. For this purpose, we begin by applying UNET (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), we will replace the unwanted content by new one using a homography mapping procedure. Furthermore, the replacement key points will be tracked into the next frames considering the zoom-in and zoom-out controlling using multiplication of the key point coordinates by the homography matrix between each two consecutive frames. Since the movement of objects in video can disrupt the alignment between frames and correspondingly make the homography matrix calculation erroneous, we use Mask R-CNN algorithm to mask and remove the moving objects from the scene. Accordingly, the replacement will be consistent to the video motion of scene. Such framework is denominated as REP-Model which stands for a replacing model. In addition, we have examined the REP-Model over a large database regarding soccer match videos for removing and replacing the playground billboard contents and the results reveal the discriminative nature of our proposed framework. Furthermore, in order to key out the covered object beneath the new content, we use an unsupervised approach in an adversarial learning set-up by learning object masks with playing a game of cut-and-paste, using a discriminator model to find out whether the covered object has been revealed correctly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Article 04 June 2022

Deepfake: An Overview

Image forgery detection: a survey of recent deep-learning approaches

Article Open access 03 October 2022

References

Aldershoff F, Gevers T (2003) Visual tracking and localization of billboards in streamed soccer matches. In: Storage and retrieval methods and applications for multimedia 2004, vol 5307. International Society for Optics and Photonics, pp 408–416
Algarni A D (2020) Efficient object detection and classification of heat emitting objects from infrared images based on deep learning. Multimed Tools Appl 79:1–24
Article Google Scholar
Bengani S, Vadivel S et al (2020) Automatic segmentation of optic disc in retinal fundus images using semi-supervised deep learning. Multimed Tools Appl 80:1–26
Google Scholar
Brock A, Donahue J, Simonyan K (2018) Large scale gan training for high fidelity natural image synthesis. arXiv:1809.11096
Brown M, Lowe D G (2007) Automatic panoramic image stitching using invariant features. Int J Comput Vis 74(1):59–73
Article Google Scholar
Burgess C P, Matthey L, Watters N, Kabra R, Higgins I, Botvinick M, Lerchner A (2019) Monet: unsupervised scene decomposition and representation. arXiv:1901.11390
Caelles S, Maninis K K, Pont-Tuset J, Leal-Taixé L, Cremers D, Van Gool L (2017) One-shot video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 221–230
Cai G, Chen L, Li J (2003) Billboard advertising detection in sport tv. In: Seventh international symposium on signal processing and its applications, 2003. Proceedings, vol 1. IEEE, pp 537–540
Cao X, Gao S, Chen L, Wang Y (2020) Ship recognition method combined with image segmentation and deep learning feature extraction in video surveillance. Multimed Tools Appl 79(13):9177–9192
Article Google Scholar
Chen M, Artières T, Denoyer L (2019) Unsupervised object segmentation by redrawing. In: Advances in neural information processing systems, pp 12705–12716
Cheng J, Tsai Y H, Hung W C, Wang S, Yang M H (2018) Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7415–7424
Chum O, Matas J (2008) Optimal randomized ransac. IEEE Trans Pattern Anal Mach Intell 30(8):1472–1482
Article Google Scholar
Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Ranzato M, Senior A, Tucker P, Yang K et al (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, pp 1223–1231
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Dewi C, Chen R C, Yu H (2020) Weight analysis for various prohibitory sign detection and recognition using deep learning. Multimed Tools Appl 79:1–19
Article Google Scholar
Egilmez H E, Chao Y H, Ortega A (2020) Graph-based transforms for video coding. IEEE Trans Image Process 29:9330–9344
Article MathSciNet Google Scholar
Eslami S A, Heess N, Weber T, Tassa Y, Szepesvari D, Hinton G E et al (2016) Attend, infer, repeat: fast scene understanding with generative models. In: Advances in neural information processing systems, pp 3225–3233
Feng Z, Neumann J (2013) Real time commercial detection in videos
Gao Z, Zhang H, Dong S, Sun S, Wang X, Yang G, Wu W, Li S, de Albuquerque V H C (2020) Salient object detection in the distributed cloud-edge intelligent network. IEEE Netw 34(2):216–224
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gregor K, Danihelka I, Graves A, Rezende D J, Wierstra D (2015)
Gruosso M, Capece N, Erra U (2020) Human segmentation in surveillance video with deep learning. Multimed Tools Appl 80:1–25
Google Scholar
Guo J, Bai H, Tang Z, Xu P, Gan D, Liu B Multi modal human action recognition for video content matching
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Hosang J, Benenson R, Dollár P, Schiele B (2015) What makes for effective detection proposals? IEEE Trans Pattern Anal Mach Intell 38(4):814–830
Article Google Scholar
Hossari M, Dev S, Nicholson M, McCabe K, Nautiyal A, Conran C, Tang J, Xu W, Pitié F (2018) Adnet: a deep network for detecting adverts. arXiv:1811.04115
Hou S, Zhou S, Liu W, Zheng Y (2018) Classifying advertising video by topicalizing high-level semantic concepts. Multimed Tools Appl 77 (19):25475–25511
Article Google Scholar
Hu Y T, Huang J B, Schwing A (2017) Maskrnn: instance level video object segmentation. In: Advances in neural information processing systems, pp 325–334
Hu P, Wang G, Kong X, Kuen J, Tan Y P (2018) Motion-guided cascaded refinement network for video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1400–1409
Hussain Z, Zhang M, Zhang X, Ye K, Thomas C, Agha Z, Ong N, Kovashka A (2017) Automatic understanding of image and video advertisements. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1705–1715
Jang S W, Ahn B (2019) Effective detection of exposed target regions based on deep learning from multimedia data. Multimed Tools Appl 79:1–17
Google Scholar
Ji X, Henriques J F, Vedaldi A (2018) Invariant information distillation for unsupervised image segmentation and clustering. arXiv:1807.06653
Jindal N et al (2020) Copy move and splicing forgery detection using deep convolution neural network, and semantic segmentation. Multimed Tools Appl 80:1–29
Google Scholar
Kanezaki A (2018) Unsupervised image segmentation by backpropagation. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1543–1547
Khoreva A, Benenson R, Ilg E, Brox T, Schiele B (2017) Lucid data dreaming for object tracking. In: The DAVIS challenge on video object segmentation
Kim Y, Jung S, Ji S, Hwang E, Rho S (2019) Iot-based personalized nie content recommendation system. Multimed Tools Appl 78(3):3009–3043. https://doi.org/10.1007/s11042-020-09603-0
Article Google Scholar
Kim D Y, Park J H, Lee Y, Kim S (2020) Network virtualization for real-time processing of object detection using deep learning. Multimed Tools Appl 1–19
Kosub S (2019) A note on the triangle inequality for the jaccard distance. Pattern Recogn Lett 120:36–38
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Lee H, Eum S, Kwon H (2019) Me r-cnn: MULti-expert r-cnn for object detection. IEEE Trans Image Process 29:1030–1044
Article MathSciNet Google Scholar
Levandowsky M, Winter D (1971) Distance between sets. Nature 234(5323):34–35
Article Google Scholar
Li Y, Tang S, Zhang R, Zhang Y, Li J, Yan S (2019) Asymmetric gan for unpaired image-to-image translation. IEEE Trans Image Process 28(12):5881–5896
Article MathSciNet Google Scholar
Lim J H, Ye J C (2017) Geometric gan. arXiv:1705.02894
Lipkus A H (1999) A proof of the triangle inequality for the tanimoto distance. J Math Chem 26(1-3):263–265
Article Google Scholar
Liu J, Wang C, Su H, Du B, Tao D (2019) Multistage gan for fabric defect detection. IEEE Trans Image Process 29:3388–3400
Article Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lucas A, Lopez-Tapia S, Molina R, Katsaggelos A K (2019) Generative adversarial networks and perceptual losses for video super-resolution. IEEE Trans Image Process 28(7):3312–3327
Article MathSciNet Google Scholar
Maninis K K, Caelles S, Chen Y, Pont-Tuset J, Leal-Taixé L, Cremers D, Van Gool L (2018) Video object segmentation without temporal information. IEEE Trans Pattern Anal Mach Intell 41(6):1515–1530
Article Google Scholar
Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. arXiv:1802.05957
Moulton R, Jiang Y (2018) Maximally consistent sampling and the jaccard index of probability distributions. arXiv:1809.04052
Ostyakov P, Suvorov R, Logacheva E, Khomenko O, Nikolenko S I (2018) Seigan: towards compositional image generation by simultaneously learning to segment, enhance, and inpaint. arXiv:1811.07630
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2663–2672
Pham T T, Do T T, Sünderhauf N, Reid I (2018) Scenecut: joint geometric and object segmentation for indoor scenes. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 1–9
Remez T, Huang J, Brown M (2018) Learning to segment via cut-and-paste. In: Proceedings of the European conference on computer vision (ECCV), pp 37–52
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Rublee E, Rabaud V, Konolige K, Bradski G (2011) Orb: an efficient alternative to sift or surf. In: 2011 International conference on computer vision. IEEE, pp 2564–2571
Sakthivelan R, Rjendran P, Thangavel M (2020) A video analysis on user feedback based recommendation using a-fp hybrid algorithm. Multimed Tools Appl 79(5):3847–3859
Article Google Scholar
Sbai O, Couprie C, Aubry M (2018) Vector image generation by learning parametric layer decomposition. arXiv:1812.05484
Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from rgbd images. In: European conference on computer vision. Springer, pp 746–760
Tran D, Ranganath R, Blei D M (2017) Deep and hierarchical implicit models. arXiv:1702.08896, 7, 3
Uijlings J R, Van De Sande K E, Gevers T, Smeulders A W (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for the 2017 Davis challenge on video object segmentation. In: The 2017 DAVIS challenge on video object segmentation-CVPR workshops, vol 5
Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for video object segmentation. arXiv:1706.09364
Voigtlaender P, Chai Y, Schroff F, Adam H, Leibe B, Chen L C (2019) Feelvos: fast end-to-end embedding learning for video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9481–9490
Watve A, Sural S (2008) Soccer video processing for the detection of advertisement billboards. Pattern Recogn Lett 29(7):994–1006
Article Google Scholar
Wei W, Fan X, Song H, Wang H (2019) Video tamper detection based on multi-scale mutual information. Multimed Tools Appl 78(19):27109–27126
Article Google Scholar
Xia X, Kulis B (2017) W-net: a deep model for fully unsupervised image segmentation. arXiv:1711.08506
Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based on deep learning. Multimed Tools Appl 79:1–63
Article Google Scholar
Yang J, Kannan A, Batra D, Parikh D (2017) Lr-gan: layered recursive generative adversarial networks for image generation. arXiv:1703.01560
Yang L, Wang Y, Xiong X, Yang J, Katsaggelos A K (2018) Efficient video object segmentation via network modulation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6499–6507
Yong B, Wang C, Shen J, Li F, Yin H, Zhou R (2020) Automatic ventricular nuclear magnetic resonance image processing with deep learning. Multimed Tools Appl 1–17. https://doi.org/10.1007/s11042-020-08911-9
Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334
Article Google Scholar

Download references

Acknowledgements

The completion of this research was made possible thanks to the Natural Sciences and Engineering research Council of Canada (NSERC). In addition, the authors would like to thank Edouard Geze, Adam Alcolado and Robert Graham for their assistance during the project.

Author information

Authors and Affiliations

Concordia University, Montreal, Canada
Vahid Khorasani Ghassab, Kamal Maanicshah & Nizar Bouguila
mtl.ai Inc., Montreal, Canada
Paul Green

Authors

Vahid Khorasani Ghassab
View author publications
You can also search for this author in PubMed Google Scholar
Kamal Maanicshah
View author publications
You can also search for this author in PubMed Google Scholar
Paul Green
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Bouguila
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vahid Khorasani Ghassab.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghassab, V.K., Maanicshah, K., Green, P. et al. Content modification of soccer videos using a supervised deep learning framework. Multimed Tools Appl 81, 481–503 (2022). https://doi.org/10.1007/s11042-021-11383-0

Download citation

Received: 28 September 2020
Revised: 17 January 2021
Accepted: 26 July 2021
Published: 13 September 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s11042-021-11383-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Content modification of soccer videos using a supervised deep learning framework

Abstract

Access this article

Similar content being viewed by others

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Deepfake: An Overview

Image forgery detection: a survey of recent deep-learning approaches

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Content modification of soccer videos using a supervised deep learning framework

Abstract

Access this article

Similar content being viewed by others

Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward

Deepfake: An Overview

Image forgery detection: a survey of recent deep-learning approaches

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation