An automatic 2D to 3D video conversion approach based on RGB-D images

Pan, Baiyu; Zhang, Liming; Yin, Hanxiong; Lan, Jun; Cao, Feilong

doi:10.1007/s11042-021-10662-0

An automatic 2D to 3D video conversion approach based on RGB-D images

Published: 24 February 2021

Volume 80, pages 19179–19201, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Baiyu Pan¹,
Liming Zhang ORCID: orcid.org/0000-0002-2664-8193¹,
Hanxiong Yin¹,
Jun Lan¹ &
…
Feilong Cao²

484 Accesses
3 Citations
Explore all metrics

Abstract

3D movies/videos have become increasingly popular in the market; however, they are usually produced by professionals. This paper presents a new technique for the automatic conversion of 2D to 3D video based on RGB-D sensors, which can be easily conducted by ordinary users. To generate a 3D image, one approach is to combine the original 2D color image and its corresponding depth map together to perform depth image-based rendering (DIBR). An RGB-D sensor is one of the inexpensive ways to capture an image and its corresponding depth map. The quality of the depth map and the DIBR algorithm are crucial to this process. Our approach is twofold. First, the depth maps captured directly by RGB-D sensors are generally of poor quality because there are many regions missing depth information, especially near the edges of objects. This paper proposes a new RGB-D sensor based depth map inpainting method that divides the regions with missing depths into interior holes and border holes. Different schemes are used to inpaint the different types of holes. Second, an improved hole filling approach for DIBR is proposed to synthesize the 3D images by using the corresponding color images and the inpainted depth maps. Extensive experiments were conducted on different evaluation datasets. The results show the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Article 23 March 2023

Image Inpainting: A Review

Article 06 December 2019

Multi3D: 3D-aware multimodal image synthesis

Article Open access 03 April 2024

Notes

https://www.youtube.com/playlist?list=PLARCNrViIwY2MgnLy94gqzk-JbSN3OD7D

References

Barnes C, Shechtman E, Finkelstein A, Goldman DB (2009) Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans Graph 28(3):24
Article Google Scholar
Basso F, Menegatti E, Pretto A (2018) Robust intrinsic and extrinsic calibration of rgb-d cameras. IEEE Trans on Robot 34(5):1315–1332
Article Google Scholar
Bertalmio M, Sapiro G, Caselles V, Ballester C (2000) Image inpainting. In: The ACM special interest group on computer graphics, pp. 417–424
Bertalmio M, Vese L, Sapiro G, Osher S (2003) Simultaneous structure and texture image inpainting. IEEE Trans Image Processing 12(8):882–889
Article Google Scholar
Bhattacharya S, Gupta S, Venkatesh KS (2014) High accuracy depth filtering for kinect using edge guided inpainting. In: International conference on advances in computing communications and informatics, pp. 868–874
Chen L, He Y, Chen J, Li Q, Zou Q (2017) Transforming a 3-d lidar point cloud into a 2-d dense depth map through a parameter self-adaptive framework. IEEE trans Intell Transp Syst 18(1):165–176
Article Google Scholar
Chen Y, Hu H (2019) An improved method for semantic image inpainting with gans: Progressive inpainting. Neural Process Lett 49:1355–1367
Article Google Scholar
Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: CVPR, pp. 1475–1483
Efros AA, Leung TK (1999) Texture synthesis by non-parametric sampling. In: International conference on computer vision, vol. 2, pp. 1033–1038
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Conference on Neural Information Processing Systems, pp. 2366–2374
Endres F, Hess J, Sturm J, Cremers D, Burgard W (2014) 3-d mapping with an rgb-d camera. IEEE Trans Robot 30(1):177–187
Article Google Scholar
Fan DP, Lin Z, Zhang Z, Zhu M, Cheng MM (2020) Rethinking RGB-d salient object detection: Models, datasets, and large-scale benchmarks IEEE Trans Neur Net Lear
Fan Q, Zhang L (2018) A novel patch matching algorithm for exemplar-based image inpainting. Multimed Tools Appl 77(9):10807–10821
Article Google Scholar
Fehn C (2003) A 3d-tv approach using depth-image-based rendering (dibr). In: The international association of science and technology for development international conference on visualization, imaging and image processing. benalmadena, Spain
Fu H, Xu D, Lin S, Liu J (2015) Object-based rgbd image co-segmentation with mutex constraint. In: CVPR, pp. 4428–4436
Fu K, Fan DP, Ji GP, Zhao Q (2020) Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In: CVPR, pp. 3052–3062
Hai-Tao Z, Yu J, Zeng-Fu W (2018) Probability contour guided depth map inpainting and superresolution using non-local total generalized variation. Multimed Tools Appl 77(7):9003–9020
Article Google Scholar
Hamout H, Elyousfi A (2020) Fast depth map intra coding for 3d video compression-based tensor feature extraction and data analysis. IEEE Trans Circuits Syst Video Technol 30(7):1933–1945
Google Scholar
Hu X, Yang K, Fei L, Wang K (2019) Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In: ICIP, pp. 1440–1444
Kang S, Kang M, Kim D, Ko S (2014) A novel depth image enhancement method based on the linear surface model. IEEE Trans Consum Electron 60(4):710–718
Article Google Scholar
Kao CC (2017) Stereoscopic image generation with depth image based rendering. Multimed Tools Appl 76(11):12981–12999
Article Google Scholar
Kim S, Ho Y (2012) Fast edge-preserving depth image upsampler. IEEE Trans Consum Electron 58(3):971–977
Article Google Scholar
Klingensmith M, Sirinivasa SS, Kaess M (2016) Articulated robot motion for simultaneous localization and mapping (arm-slam). IEEE Robot Auto Lett 1(2):1156–1163
Article Google Scholar
Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: IEEE International conference on robotics and automation, pp. 1817–1824
Lee J, Lee D, Park R (2012) Robust exemplar-based inpainting algorithm using region segmentation. IEEE Trans Consum Electron 58(2):553–561
Article Google Scholar
Lei J, Zhang C, Wu M, You L, Fan K, Hou C (2017) A divide-and-conquer hole-filling method for handling disocclusion in single-view rendering. Multimed Tools Appl 76(6):7661–7676
Article Google Scholar
Liang C, Qi L, He Y (2018) Guan, l.: 3d human action recognition using a single depth feature and locality-constrained affine subspace coding. IEEE Trans Circuits Syst Video Technol 28(10):2920–2932
Article Google Scholar
Liu J, Gong X, Liu J (2012) Guided inpainting and filtering for kinect depth maps. In: International conference on pattern recognition, pp. 2055–2058. IEEE
Ma F, Cavalheiro GV, Karaman S (2019) Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. In: ICRA
Ma F, Karaman S (2018) Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In: ICRA
Mariwan Abdulla A (2020) Quality improvement for exemplar-based image inpainting using a modified searching mechanism. UHD J Sci Tech 4(1):1–8
Article Google Scholar
Mayer N, Ilg E, Häusser P., Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR
McMillan Jr L (1997) An image-based approach to three-dimensional computer graphics. Ph.D. thesis, Dept. CS NC Chapel Hill Univ
Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: IEEE International conference on information technology and computer science
Miao D, Fu J, Lu Y, Li S, Chen CW (2012) Texture-assisted kinect depth inpainting. In: The IEEE international symposium on circuits and systems, pp. 604–607
Minoli D (2010) 3DTV content capture encoding and transmission: building the transport infrastructure for commercial services
Park H, Lee KM (2017) Look wider to match image patches with convolutional neural networks. IEEE Signal Process Lett 24(12):1788–1792
Article Google Scholar
Park J, Kim H, Tai YW, Brown MS, Kweon I (2011) High quality depth map upsampling for 3d-tof cameras. In: International conference on computer vision, pp. 1623–1630
Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: Feature learning by inpainting. In: IEEE Conference on computer vision and pattern recognition, pp. 2536–2544
Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: Feature learning by inpainting. In: IEEE International conference on information technology and computer science, pp. 2536–2544
Richard MMOBB, Chang MYS (2001) Fast digital image inpainting. In: The international association of science and technology for development international conference on visualization, imaging and image processing, pp. 106–107
Rui S, Hyunsuk K, Jay KCC (2014) Mcl-3d: a database for stereoscopic image quality assessment using 2d-image-plus-depth source. J Inf Sci Eng 31:1593–1611
Google Scholar
Shih ML, Su SY, Kopf J (2020) Huang, J.B.: 3d photography using context-aware layered depth inpainting. In: CVPR
Smolic A, Kauff P, Knorr S, Hornung A, Kunter M, Müller M., Lang M (2011) Three-dimensional video postproduction and processing. Proc of the IEEE 99(4):607–625
Article Google Scholar
Tao W, Jin H, Zhang Y (2007) Color image segmentation based on mean shift and normalized cuts. IEEE Trans Syst Man Cybern B Cybern 37 (5):1382–1389
Article Google Scholar
Telea A (2004) An image inpainting technique based on the fast marching method. J Graphics Tool 9(1):23–34
Article Google Scholar
Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: ECCV
Wang W, Ramesh A, Zhu J, Li J, Zhao D (2020) Clustering of driving encounter scenarios using connected vehicle trajectories. IEEE Trans Intel Vehicles 5(3):485–496
Article Google Scholar
Xu Y, Zhu X, Shi J, Zhang G, Bao H, Li H (2019) Depth completion from sparse lidar data with depth-normal constraints. In: ICCV, pp. 2811–2820
Yang J, Ye X, Li K, Hou C, Wang Y (2014) Color-guided depth recovery from rgb-d data using an adaptive autoregressive model. IEEE Trans Image Process 23(8):3443–3458
Article MathSciNet MATH Google Scholar
Yao L, Han Y, Li X (2019) Fast and high-quality virtual view synthesis from multi-view plus depth videos. Multimed Tools Appl 78(14):19325–19340
Article Google Scholar
Ying H, Zhang L, Luo G, Zhu Y (2015) A new disocclusion filling approach in depth image based rendering for stereoscopic imaging. In: Interface conference on control, automation and information sciences, pp. 313–317
Yu Y, Song Y, Zhang Y, Wen S (2012) A shadow repair approach for kinect depth maps. In: Asian conference on computer vision, pp. 615–626. Springer
Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(65):1–32
MATH Google Scholar
Zhang J, Fan DP, Dai Y, Anwar S, Sadat Saleh F, Zhang T, Barnes N (2020) Uc-net: Uncertainty inspired rgb-d saliency detection via conditional variational autoencoders. In: CVPR
Zhang L, Lan J, Yin H, Luo G, Zhu Y (2016) Kinect based 3d video generation. In: IADIS International conference computer graphics, visualization, computer vision and image processing, pp 278–282, Madeira, Portugal
Zhang S, Zhu Y, Po LM (2011) A new depth-aided multidirectional disocclusion restoration method for depth-image-based rendering. In: International conference on information technology and computer science. ASME press
Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334
Article Google Scholar
Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE Multimed 19(2):4–10
Article Google Scholar
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar

Download references

Acknowledgment

This study is supported by the research grants: The Science and Technology Development Fund of Macao SAR FDCT079/2016/A2, and MYRG2017-00218-FST, MYRG2018-00111-FST.

Author information

Authors and Affiliations

Faculty of Science and Technology, University of Macau, Taipa, Macao, China
Baiyu Pan, Liming Zhang, Hanxiong Yin & Jun Lan
College of Sciences, China Jiliang University, Hangzhou, China
Feilong Cao

Authors

Baiyu Pan
View author publications
You can also search for this author in PubMed Google Scholar
Liming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hanxiong Yin
View author publications
You can also search for this author in PubMed Google Scholar
Jun Lan
View author publications
You can also search for this author in PubMed Google Scholar
Feilong Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baiyu Pan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pan, B., Zhang, L., Yin, H. et al. An automatic 2D to 3D video conversion approach based on RGB-D images. Multimed Tools Appl 80, 19179–19201 (2021). https://doi.org/10.1007/s11042-021-10662-0

Download citation

Received: 23 June 2020
Revised: 23 September 2020
Accepted: 04 February 2021
Published: 24 February 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s11042-021-10662-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An automatic 2D to 3D video conversion approach based on RGB-D images

Abstract

Access this article

Similar content being viewed by others

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Image Inpainting: A Review

Multi3D: 3D-aware multimodal image synthesis

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An automatic 2D to 3D video conversion approach based on RGB-D images

Abstract

Access this article

Similar content being viewed by others

Deep Learning on Image Stitching With Multi-viewpoint Images: A Survey

Image Inpainting: A Review

Multi3D: 3D-aware multimodal image synthesis

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation