Towards 6DoF live video streaming system for immersive media

Cai, Yangang; Gao, Xuesong; Chen, Weiqiang; Wang, Ronggang

doi:10.1007/s11042-021-11589-2

Towards 6DoF live video streaming system for immersive media

1190: Depth-Related Processing and Applications in Visual Systems
Published: 02 June 2022

Volume 81, pages 35875–35898, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yangang Cai ORCID: orcid.org/0000-0002-7525-5361¹,
Xuesong Gao¹,
Weiqiang Chen¹ &
…
Ronggang Wang¹

358 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Based on the three rotational degrees (video in three dimensions, on the X, Y and Z axes) of freedom provided by VR, the viewer is free to control the viewing point and has six degrees of freedom (6DoF). When watching a sports game, the audience is no longer limited by the position of the camera, and can freely choose the viewing angle and position just like watching in the real world, which can greatly improve the immersion of viewing. However, the major barrier that prevents 6DoF video live from being industrialized lies in the extremely high computational complexity, of which multi-view depth estimation and Depth Image Based Rendering (DIBR) is difficult to realize. And existing devices do not have hardware interfaces that support multi-views coding technology. Therefore, we need new technologies for depth estimation and virtual view synthesis, and we need to use existing hardware coding/decoding interfaces to reduce power consumption. In this paper, we provide a 6DoF live video system, which includes multi-view depth estimation technique based on unsupervised learning, virtual viewpoint real-time rendering technology and 6DoF video coding. Experimental results demonstrate that our proposed acceleration method can speed up the original depth estimation algorithm by more than 34x, and can speed up the original DIBR algorithm by more than 168x. With our 6DoF video coding method, experimental results show that the bit rate achieves an average of 70%, 64%, 33%, 60% and 66% bitrate saving for AVC, HEVC, AV1, AVS3, VVC codec standard respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D Face Reconstruction in Deep Learning Era: A Survey

Article 10 January 2022

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Article 23 March 2023

Video super-resolution based on deep learning: a comprehensive survey

Article 01 April 2022

References

AV1 software (commit d7fe8a44e87a)
Cai Y, Wang R, Cui T, Lv H, Ma S (2013) Intermediate view synthesis based on edge detecting. In: 2013 IEEE international conference on image processing, Melbourne, VIC, pp 3172–3175. https://doi.org/10.1109/ICIP.2013.6738653.
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698
Article Google Scholar
Duan Y, Sun J, Yan L, Chen K, Guo Z (2014) Novel efficient HEVC decoding solution on general-purpose processors. IEEE Trans Multimedia 16(7):1915–1928
Article Google Scholar
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. NIPS 1(3):5
Google Scholar
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE international conference on computer vision (ICCV), Santiago, pp 2650–2658
Fan K, Wang R, Wang Y, Li G, Gao W (2017) Improved intra boundary filters for HEVC. In: 2017 IEEE visual communications and image processing (VCIP), St. Petersburg, FL, pp 1–4
Fang J, Varbanescu AL, Sips H (2011) A comprehensive performance comparison of CUDA and OpenCL. In: Proc IEEE int conf parallel process, pp 216–225
Fehn C (2004) Depth-image-basedrendering(DIBR),compression,andtransmission for a new approach on 3D-TV. In: Proc SPIE stereoscopic displays virtual reality syst XI, pp 93–104
ftp://47.93.196.121vruadminVruAdmin+17
Gu X, Fan Z, Zhu S, Dai Z, Tan F, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2495–2504.
Hirschmüller H (2008) IEEE Trans Pattern Anal Mach Intell 30(2):328–341
Article Google Scholar
https://developer.nvidia.com/nvidia-video-codec-sdk
JVET-M1002 (2019) Algorithm description for versatile video coding and test model 4 (VTM 4). 13th meeting: Marrakech, MA, 9–18 Jan
JCT-VC Subversion repository for the HEVC test model. https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/
Konda K, Memisevic R (2013) Unsupervised learning of depth and motion. https://arxiv.org/abs/1312.3429
Levin A, Fergus R, Durand F, Freeman WT (2007) Image and depth from a conventional camera with a coded aperture. In SIGGRAPH
Levoy M, Hanrahan P (1996) Light field rendering. In: International conference on computer graphics and interactive techniques, (ACM SIG-GRAPH). ACM Press, New York, pp 31–42
Li L, Li Z, Li B, Liu D, Li H (2017) Pseudo-sequence-based 2-D hierarchical coding structure for light-field image compression. IEEE J Select Top Signal Process 11(7):1107–1119. https://doi.org/10.1109/JSTSP.2017.2725198
Article Google Scholar
Ligon J, Bein D, Ly P, Onesto B (2018) 3D point cloud processing using spin images for object detection. In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC), Las Vegas, NV, pp 731–736
Lin S, Zhang X, Yu Q, Qi H, Ma S (2013) Parallelizing video transcoding with load balancing on cloud computing. In: 2013 IEEE international symposium on circuits and systems (ISCAS), Beijing, pp 2864–2867
Liu D, Wang L, Li L, Xiong Z, Wu F, Zeng W (2016) Pseudo-sequence-based light field image compression. In: 2016 IEEE international conference on multimedia & expo workshops (ICMEW), Seattle, WA, pp 1–4. https://doi.org/10.1109/ICMEW.2016.7574674.
Liu Z, Lin Z, Wei X, Chan S (2018) A new model-based method for multi-view human body tracking and its application to view transfer in image-based rendering. IEEE Trans Multimedia 20(6):1321–1334
Article Google Scholar
Momcilovic S, Ilic A, Roma N, Sousa L (2014) Dynamic load balancing for real-time video encoding on heterogeneous CPU+GPU systems. IEEE Trans Multimedia 16(1):108–121
Article Google Scholar
Morvan, Y (2007) Acquisition, compression and rendering of depth and texture for multi-view video. PhD thesis, Technische Universiteit Eindhoven
Mueller M, Zilly F, Kauff P (2010) Adaptive cross-trilateral depth map filtering. In: 2010 3DTV-conference: the true vision—capture, transmission and display of 3D video, Tampere, pp 1–4
Opitz M, Waltner G, Poier G, Possegger H, Bischof H (2016) Grid loss: detecting occluded faces. In: Proc Eur conf comput vis, pp 386–402
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: MICCAI
Scharstein D, Szeliski R, Zabih R (2001) A taxonomy and evalua- tion of dense two-frame stereo correspondence algorithms. In: IEEE workshop on stereo and multi-baseline vision, pp 131–140, 2001
Schönberger JL, Zheng E, Frahm J-M, Pollefeys M (2016) Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), v 9907 LNCS. Computer vision 14th European conference, ECCV 2016, proceedings, pp 501–518
Seitz SM, Dyer CR (1996) View morphing. In: International conference on computer graphics and interactive techniques, (ACM SIG-GRAPH). ACM Press, New York, pp 21–30
Sullivan GJ, Ohm J, Woo-Jin H, Wiegand T, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circ Syst Video Technol 22(12):1649–1668
Article Google Scholar
Tanimoto M (2014) “FTV standardization in MPEG, ” 2014 3DTV-conference: the true vision - capture, transmission and display of 3D video (3DTV-CON). Budapest. https://doi.org/10.1109/3DTV.2014.6874767
Article Google Scholar
Tech G, Wegner K, Chen Y, Yea S (2013) “3D-HEVC Test Model 3,” ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, JCT3V-C1005. 3rd meeting, Geneva, CH, 17–23 Jan
Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: Proceedings of the IEEE international conference on computer vision
van de Sande KEA, Gevers T, Snoek CGM (2011) Empowering visual categorization with the GPU. IEEE Trans Multimedia 13(1):60–70
Article Google Scholar
Wang R et al (2017) Accelerating image-domain-warping virtual view synthesis on GPGPU. IEEE Trans Multimedia 19(6):1392–1400
Article Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Imagequality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Wang B, de Sousa DF, Alvarez-Mesa M, Chi CC, Juurlink B, Ilic A, Roma N, Sousa L (2018) Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU. Image Commun 62:93–105
Google Scholar
Wang S, Wang R (2019) Robust view synthesis in wide-baseline complex geometric environments. In: ICASSP 2019—2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton, UK, pp 2297–2301
Würmlin S, Lamboray E, Gross M (2004) 3D video fragments: dynamic point samples for real-time free-viewpoint video. In: Computers and graphics, special issue on coding, compression and streaming techniques for 3D and multimedia data. Elsevier, Amsterdam, pp 3–14
Yao Y, Luo Z, Li S, Fang T, Quan L (2018) Mvsnet: depth inference for unstructured multi-view stereo. In: Proceedings of the European conference on computer vision (ECCV), pp 767–783

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China 61672063, 61902008, Shenzhen Research Projects of JCYJ20180503182128089 and 201806080921419290. And this work is partially supported by the project "PCL Future Greater-Bay Area Network Facilities for Large-scale Experiments and Applications (LZC0019). Thanks to Hisense for providing the experimental platform and data evaluation.

Author information

Authors and Affiliations

Shenzhen Graduate School, Peking University, Shenzhen, Guangdong, China
Yangang Cai, Xuesong Gao, Weiqiang Chen & Ronggang Wang

Authors

Yangang Cai
View author publications
You can also search for this author in PubMed Google Scholar
Xuesong Gao
View author publications
You can also search for this author in PubMed Google Scholar
Weiqiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ronggang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yangang Cai.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (MP4 6875 kb)

Supplementary file2 (TS 87369 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cai, Y., Gao, X., Chen, W. et al. Towards 6DoF live video streaming system for immersive media. Multimed Tools Appl 81, 35875–35898 (2022). https://doi.org/10.1007/s11042-021-11589-2

Download citation

Received: 07 December 2020
Revised: 05 August 2021
Accepted: 20 September 2021
Published: 02 June 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11042-021-11589-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards 6DoF live video streaming system for immersive media

Abstract

Access this article

Similar content being viewed by others

3D Face Reconstruction in Deep Learning Era: A Survey

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Video super-resolution based on deep learning: a comprehensive survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Supplementary Information

Supplementary file2 (TS 87369 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards 6DoF live video streaming system for immersive media

Abstract

Access this article

Similar content being viewed by others

3D Face Reconstruction in Deep Learning Era: A Survey

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Video super-resolution based on deep learning: a comprehensive survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Supplementary Information

Supplementary file2 (TS 87369 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation