Skip to main content
Log in

Towards 6DoF live video streaming system for immersive media

  • 1190: Depth-Related Processing and Applications in Visual Systems
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Based on the three rotational degrees (video in three dimensions, on the X, Y and Z axes) of freedom provided by VR, the viewer is free to control the viewing point and has six degrees of freedom (6DoF). When watching a sports game, the audience is no longer limited by the position of the camera, and can freely choose the viewing angle and position just like watching in the real world, which can greatly improve the immersion of viewing. However, the major barrier that prevents 6DoF video live from being industrialized lies in the extremely high computational complexity, of which multi-view depth estimation and Depth Image Based Rendering (DIBR) is difficult to realize. And existing devices do not have hardware interfaces that support multi-views coding technology. Therefore, we need new technologies for depth estimation and virtual view synthesis, and we need to use existing hardware coding/decoding interfaces to reduce power consumption. In this paper, we provide a 6DoF live video system, which includes multi-view depth estimation technique based on unsupervised learning, virtual viewpoint real-time rendering technology and 6DoF video coding. Experimental results demonstrate that our proposed acceleration method can speed up the original depth estimation algorithm by more than 34x, and can speed up the original DIBR algorithm by more than 168x. With our 6DoF video coding method, experimental results show that the bit rate achieves an average of 70%, 64%, 33%, 60% and 66% bitrate saving for AVC, HEVC, AV1, AVS3, VVC codec standard respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. AV1 software (commit d7fe8a44e87a)

  2. Cai Y, Wang R, Cui T, Lv H, Ma S (2013) Intermediate view synthesis based on edge detecting. In: 2013 IEEE international conference on image processing, Melbourne, VIC, pp 3172–3175. https://doi.org/10.1109/ICIP.2013.6738653.

  3. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698

    Article  Google Scholar 

  4. Duan Y, Sun J, Yan L, Chen K, Guo Z (2014) Novel efficient HEVC decoding solution on general-purpose processors. IEEE Trans Multimedia 16(7):1915–1928

    Article  Google Scholar 

  5. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. NIPS 1(3):5

    Google Scholar 

  6. Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE international conference on computer vision (ICCV), Santiago, pp 2650–2658

  7. Fan K, Wang R, Wang Y, Li G, Gao W (2017) Improved intra boundary filters for HEVC. In: 2017 IEEE visual communications and image processing (VCIP), St. Petersburg, FL, pp 1–4

  8. Fang J, Varbanescu AL, Sips H (2011) A comprehensive performance comparison of CUDA and OpenCL. In: Proc IEEE int conf parallel process, pp 216–225

  9. Fehn C (2004) Depth-image-basedrendering(DIBR),compression,andtransmission for a new approach on 3D-TV. In: Proc SPIE stereoscopic displays virtual reality syst XI, pp 93–104

  10. ftp://47.93.196.121vruadminVruAdmin+17

  11. Gu X, Fan Z, Zhu S, Dai Z, Tan F, Tan P (2020) Cascade cost volume for high-resolution multi-view stereo and stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2495–2504.

  12. Hirschmüller H (2008) IEEE Trans Pattern Anal Mach Intell 30(2):328–341

    Article  Google Scholar 

  13. https://developer.nvidia.com/nvidia-video-codec-sdk

  14. JVET-M1002 (2019) Algorithm description for versatile video coding and test model 4 (VTM 4). 13th meeting: Marrakech, MA, 9–18 Jan

  15. JCT-VC Subversion repository for the HEVC test model. https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/

  16. Konda K, Memisevic R (2013) Unsupervised learning of depth and motion. https://arxiv.org/abs/1312.3429

  17. Levin A, Fergus R, Durand F, Freeman WT (2007) Image and depth from a conventional camera with a coded aperture. In SIGGRAPH

  18. Levoy M, Hanrahan P (1996) Light field rendering. In: International conference on computer graphics and interactive techniques, (ACM SIG-GRAPH). ACM Press, New York, pp 31–42

  19. Li L, Li Z, Li B, Liu D, Li H (2017) Pseudo-sequence-based 2-D hierarchical coding structure for light-field image compression. IEEE J Select Top Signal Process 11(7):1107–1119. https://doi.org/10.1109/JSTSP.2017.2725198

    Article  Google Scholar 

  20. Ligon J, Bein D, Ly P, Onesto B (2018) 3D point cloud processing using spin images for object detection. In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC), Las Vegas, NV, pp 731–736

  21. Lin S, Zhang X, Yu Q, Qi H, Ma S (2013) Parallelizing video transcoding with load balancing on cloud computing. In: 2013 IEEE international symposium on circuits and systems (ISCAS), Beijing, pp 2864–2867

  22. Liu D, Wang L, Li L, Xiong Z, Wu F, Zeng W (2016) Pseudo-sequence-based light field image compression. In: 2016 IEEE international conference on multimedia & expo workshops (ICMEW), Seattle, WA, pp 1–4. https://doi.org/10.1109/ICMEW.2016.7574674.

  23. Liu Z, Lin Z, Wei X, Chan S (2018) A new model-based method for multi-view human body tracking and its application to view transfer in image-based rendering. IEEE Trans Multimedia 20(6):1321–1334

    Article  Google Scholar 

  24. Momcilovic S, Ilic A, Roma N, Sousa L (2014) Dynamic load balancing for real-time video encoding on heterogeneous CPU+GPU systems. IEEE Trans Multimedia 16(1):108–121

    Article  Google Scholar 

  25. Morvan, Y (2007) Acquisition, compression and rendering of depth and texture for multi-view video. PhD thesis, Technische Universiteit Eindhoven

  26. Mueller M, Zilly F, Kauff P (2010) Adaptive cross-trilateral depth map filtering. In: 2010 3DTV-conference: the true vision—capture, transmission and display of 3D video, Tampere, pp 1–4

  27. Opitz M, Waltner G, Poier G, Possegger H, Bischof H (2016) Grid loss: detecting occluded faces. In: Proc Eur conf comput vis, pp 386–402

  28. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: MICCAI

  29. Scharstein D, Szeliski R, Zabih R (2001) A taxonomy and evalua- tion of dense two-frame stereo correspondence algorithms. In: IEEE workshop on stereo and multi-baseline vision, pp 131–140, 2001

  30. Schönberger JL, Zheng E, Frahm J-M, Pollefeys M (2016) Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), v 9907 LNCS. Computer vision 14th European conference, ECCV 2016, proceedings, pp 501–518

  31. Seitz SM, Dyer CR (1996) View morphing. In: International conference on computer graphics and interactive techniques, (ACM SIG-GRAPH). ACM Press, New York, pp 21–30

  32. Sullivan GJ, Ohm J, Woo-Jin H, Wiegand T, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circ Syst Video Technol 22(12):1649–1668

    Article  Google Scholar 

  33. Tanimoto M (2014) “FTV standardization in MPEG, ” 2014 3DTV-conference: the true vision - capture, transmission and display of 3D video (3DTV-CON). Budapest. https://doi.org/10.1109/3DTV.2014.6874767

    Article  Google Scholar 

  34. Tech G, Wegner K, Chen Y, Yea S (2013) “3D-HEVC Test Model 3,” ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, JCT3V-C1005. 3rd meeting, Geneva, CH, 17–23 Jan

  35. Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: Proceedings of the IEEE international conference on computer vision

  36. van de Sande KEA, Gevers T, Snoek CGM (2011) Empowering visual categorization with the GPU. IEEE Trans Multimedia 13(1):60–70

    Article  Google Scholar 

  37. Wang R et al (2017) Accelerating image-domain-warping virtual view synthesis on GPGPU. IEEE Trans Multimedia 19(6):1392–1400

    Article  Google Scholar 

  38. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Imagequality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  39. Wang B, de Sousa DF, Alvarez-Mesa M, Chi CC, Juurlink B, Ilic A, Roma N, Sousa L (2018) Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU. Image Commun 62:93–105

    Google Scholar 

  40. Wang S, Wang R (2019) Robust view synthesis in wide-baseline complex geometric environments. In: ICASSP 2019—2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), Brighton, UK, pp 2297–2301

  41. Würmlin S, Lamboray E, Gross M (2004) 3D video fragments: dynamic point samples for real-time free-viewpoint video. In: Computers and graphics, special issue on coding, compression and streaming techniques for 3D and multimedia data. Elsevier, Amsterdam, pp 3–14

  42. Yao Y, Luo Z, Li S, Fang T, Quan L (2018) Mvsnet: depth inference for unstructured multi-view stereo. In: Proceedings of the European conference on computer vision (ECCV), pp 767–783

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China 61672063, 61902008, Shenzhen Research Projects of JCYJ20180503182128089 and 201806080921419290. And this work is partially supported by the project "PCL Future Greater-Bay Area Network Facilities for Large-scale Experiments and Applications (LZC0019). Thanks to Hisense for providing the experimental platform and data evaluation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yangang Cai.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (MP4 6875 kb)

Supplementary file2 (TS 87369 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, Y., Gao, X., Chen, W. et al. Towards 6DoF live video streaming system for immersive media. Multimed Tools Appl 81, 35875–35898 (2022). https://doi.org/10.1007/s11042-021-11589-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11589-2

Keywords

Navigation