Abstract
3D movies/videos have become increasingly popular in the market; however, they are usually produced by professionals. This paper presents a new technique for the automatic conversion of 2D to 3D video based on RGB-D sensors, which can be easily conducted by ordinary users. To generate a 3D image, one approach is to combine the original 2D color image and its corresponding depth map together to perform depth image-based rendering (DIBR). An RGB-D sensor is one of the inexpensive ways to capture an image and its corresponding depth map. The quality of the depth map and the DIBR algorithm are crucial to this process. Our approach is twofold. First, the depth maps captured directly by RGB-D sensors are generally of poor quality because there are many regions missing depth information, especially near the edges of objects. This paper proposes a new RGB-D sensor based depth map inpainting method that divides the regions with missing depths into interior holes and border holes. Different schemes are used to inpaint the different types of holes. Second, an improved hole filling approach for DIBR is proposed to synthesize the 3D images by using the corresponding color images and the inpainted depth maps. Extensive experiments were conducted on different evaluation datasets. The results show the effectiveness of our method.
Similar content being viewed by others
References
Barnes C, Shechtman E, Finkelstein A, Goldman DB (2009) Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans Graph 28(3):24
Basso F, Menegatti E, Pretto A (2018) Robust intrinsic and extrinsic calibration of rgb-d cameras. IEEE Trans on Robot 34(5):1315–1332
Bertalmio M, Sapiro G, Caselles V, Ballester C (2000) Image inpainting. In: The ACM special interest group on computer graphics, pp. 417–424
Bertalmio M, Vese L, Sapiro G, Osher S (2003) Simultaneous structure and texture image inpainting. IEEE Trans Image Processing 12(8):882–889
Bhattacharya S, Gupta S, Venkatesh KS (2014) High accuracy depth filtering for kinect using edge guided inpainting. In: International conference on advances in computing communications and informatics, pp. 868–874
Chen L, He Y, Chen J, Li Q, Zou Q (2017) Transforming a 3-d lidar point cloud into a 2-d dense depth map through a parameter self-adaptive framework. IEEE trans Intell Transp Syst 18(1):165–176
Chen Y, Hu H (2019) An improved method for semantic image inpainting with gans: Progressive inpainting. Neural Process Lett 49:1355–1367
Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: CVPR, pp. 1475–1483
Efros AA, Leung TK (1999) Texture synthesis by non-parametric sampling. In: International conference on computer vision, vol. 2, pp. 1033–1038
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Conference on Neural Information Processing Systems, pp. 2366–2374
Endres F, Hess J, Sturm J, Cremers D, Burgard W (2014) 3-d mapping with an rgb-d camera. IEEE Trans Robot 30(1):177–187
Fan DP, Lin Z, Zhang Z, Zhu M, Cheng MM (2020) Rethinking RGB-d salient object detection: Models, datasets, and large-scale benchmarks IEEE Trans Neur Net Lear
Fan Q, Zhang L (2018) A novel patch matching algorithm for exemplar-based image inpainting. Multimed Tools Appl 77(9):10807–10821
Fehn C (2003) A 3d-tv approach using depth-image-based rendering (dibr). In: The international association of science and technology for development international conference on visualization, imaging and image processing. benalmadena, Spain
Fu H, Xu D, Lin S, Liu J (2015) Object-based rgbd image co-segmentation with mutex constraint. In: CVPR, pp. 4428–4436
Fu K, Fan DP, Ji GP, Zhao Q (2020) Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In: CVPR, pp. 3052–3062
Hai-Tao Z, Yu J, Zeng-Fu W (2018) Probability contour guided depth map inpainting and superresolution using non-local total generalized variation. Multimed Tools Appl 77(7):9003–9020
Hamout H, Elyousfi A (2020) Fast depth map intra coding for 3d video compression-based tensor feature extraction and data analysis. IEEE Trans Circuits Syst Video Technol 30(7):1933–1945
Hu X, Yang K, Fei L, Wang K (2019) Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In: ICIP, pp. 1440–1444
Kang S, Kang M, Kim D, Ko S (2014) A novel depth image enhancement method based on the linear surface model. IEEE Trans Consum Electron 60(4):710–718
Kao CC (2017) Stereoscopic image generation with depth image based rendering. Multimed Tools Appl 76(11):12981–12999
Kim S, Ho Y (2012) Fast edge-preserving depth image upsampler. IEEE Trans Consum Electron 58(3):971–977
Klingensmith M, Sirinivasa SS, Kaess M (2016) Articulated robot motion for simultaneous localization and mapping (arm-slam). IEEE Robot Auto Lett 1(2):1156–1163
Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: IEEE International conference on robotics and automation, pp. 1817–1824
Lee J, Lee D, Park R (2012) Robust exemplar-based inpainting algorithm using region segmentation. IEEE Trans Consum Electron 58(2):553–561
Lei J, Zhang C, Wu M, You L, Fan K, Hou C (2017) A divide-and-conquer hole-filling method for handling disocclusion in single-view rendering. Multimed Tools Appl 76(6):7661–7676
Liang C, Qi L, He Y (2018) Guan, l.: 3d human action recognition using a single depth feature and locality-constrained affine subspace coding. IEEE Trans Circuits Syst Video Technol 28(10):2920–2932
Liu J, Gong X, Liu J (2012) Guided inpainting and filtering for kinect depth maps. In: International conference on pattern recognition, pp. 2055–2058. IEEE
Ma F, Cavalheiro GV, Karaman S (2019) Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. In: ICRA
Ma F, Karaman S (2018) Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In: ICRA
Mariwan Abdulla A (2020) Quality improvement for exemplar-based image inpainting using a modified searching mechanism. UHD J Sci Tech 4(1):1–8
Mayer N, Ilg E, Häusser P., Fischer P, Cremers D, Dosovitskiy A, Brox T (2016) A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR
McMillan Jr L (1997) An image-based approach to three-dimensional computer graphics. Ph.D. thesis, Dept. CS NC Chapel Hill Univ
Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: IEEE International conference on information technology and computer science
Miao D, Fu J, Lu Y, Li S, Chen CW (2012) Texture-assisted kinect depth inpainting. In: The IEEE international symposium on circuits and systems, pp. 604–607
Minoli D (2010) 3DTV content capture encoding and transmission: building the transport infrastructure for commercial services
Park H, Lee KM (2017) Look wider to match image patches with convolutional neural networks. IEEE Signal Process Lett 24(12):1788–1792
Park J, Kim H, Tai YW, Brown MS, Kweon I (2011) High quality depth map upsampling for 3d-tof cameras. In: International conference on computer vision, pp. 1623–1630
Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: Feature learning by inpainting. In: IEEE Conference on computer vision and pattern recognition, pp. 2536–2544
Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: Feature learning by inpainting. In: IEEE International conference on information technology and computer science, pp. 2536–2544
Richard MMOBB, Chang MYS (2001) Fast digital image inpainting. In: The international association of science and technology for development international conference on visualization, imaging and image processing, pp. 106–107
Rui S, Hyunsuk K, Jay KCC (2014) Mcl-3d: a database for stereoscopic image quality assessment using 2d-image-plus-depth source. J Inf Sci Eng 31:1593–1611
Shih ML, Su SY, Kopf J (2020) Huang, J.B.: 3d photography using context-aware layered depth inpainting. In: CVPR
Smolic A, Kauff P, Knorr S, Hornung A, Kunter M, Müller M., Lang M (2011) Three-dimensional video postproduction and processing. Proc of the IEEE 99(4):607–625
Tao W, Jin H, Zhang Y (2007) Color image segmentation based on mean shift and normalized cuts. IEEE Trans Syst Man Cybern B Cybern 37 (5):1382–1389
Telea A (2004) An image inpainting technique based on the fast marching method. J Graphics Tool 9(1):23–34
Wang W, Neumann U (2018) Depth-aware cnn for rgb-d segmentation. In: ECCV
Wang W, Ramesh A, Zhu J, Li J, Zhao D (2020) Clustering of driving encounter scenarios using connected vehicle trajectories. IEEE Trans Intel Vehicles 5(3):485–496
Xu Y, Zhu X, Shi J, Zhang G, Bao H, Li H (2019) Depth completion from sparse lidar data with depth-normal constraints. In: ICCV, pp. 2811–2820
Yang J, Ye X, Li K, Hou C, Wang Y (2014) Color-guided depth recovery from rgb-d data using an adaptive autoregressive model. IEEE Trans Image Process 23(8):3443–3458
Yao L, Han Y, Li X (2019) Fast and high-quality virtual view synthesis from multi-view plus depth videos. Multimed Tools Appl 78(14):19325–19340
Ying H, Zhang L, Luo G, Zhu Y (2015) A new disocclusion filling approach in depth image based rendering for stereoscopic imaging. In: Interface conference on control, automation and information sciences, pp. 313–317
Yu Y, Song Y, Zhang Y, Wen S (2012) A shadow repair approach for kinect depth maps. In: Asian conference on computer vision, pp. 615–626. Springer
Zbontar J, LeCun Y (2016) Stereo matching by training a convolutional neural network to compare image patches. J Mach Learn Res 17(65):1–32
Zhang J, Fan DP, Dai Y, Anwar S, Sadat Saleh F, Zhang T, Barnes N (2020) Uc-net: Uncertainty inspired rgb-d saliency detection via conditional variational autoencoders. In: CVPR
Zhang L, Lan J, Yin H, Luo G, Zhu Y (2016) Kinect based 3d video generation. In: IADIS International conference computer graphics, visualization, computer vision and image processing, pp 278–282, Madeira, Portugal
Zhang S, Zhu Y, Po LM (2011) A new depth-aided multidirectional disocclusion restoration method for depth-image-based rendering. In: International conference on information technology and computer science. ASME press
Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334
Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE Multimed 19(2):4–10
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Acknowledgment
This study is supported by the research grants: The Science and Technology Development Fund of Macao SAR FDCT079/2016/A2, and MYRG2017-00218-FST, MYRG2018-00111-FST.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pan, B., Zhang, L., Yin, H. et al. An automatic 2D to 3D video conversion approach based on RGB-D images. Multimed Tools Appl 80, 19179–19201 (2021). https://doi.org/10.1007/s11042-021-10662-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10662-0