Learning-based high-efficiency compression framework for light field videos

Wang, Bing; Xiang, Wei; Wang, Eric; Peng, Qiang; Gao, Pan; Wu, Xiao

doi:10.1007/s11042-022-11955-8

Learning-based high-efficiency compression framework for light field videos

Published: 28 January 2022

Volume 81, pages 7527–7560, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Bing Wang ORCID: orcid.org/0000-0002-4502-8686^1,2,
Wei Xiang³,
Eric Wang¹,
Qiang Peng²,
Pan Gao⁴ &
…
Xiao Wu²

584 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

The massive amount of data usage for light field (LF) information poses grand challenges for efficient compression designs. There have been several LF video compression methods focusing on exploring efficient prediction structures reported in the literature. However, the number of possible prediction structures is infinite, and these methods fail to fully exploit the intrinsic geometry between views of an LF video. In this paper, we propose a deep learning-based high-efficiency LF video compression framework by exploiting the inherent geometrical structure of LF videos. The proposed framework is composed of several crucial components, namely sparse coding based on a universal view sampling method (UVSM) and a CNN-based LF view synthesis algorithm (LF-CNN), a high-efficiency adaptive prediction structure (APS), and a synthesized candidate reference (SCR)-based inter-frame prediction strategy. Specifically, instead of encoding all the views in an LF video, only parts of views are compressed while the remaining views are reconstructed from the encoded views with LF-CNN. The prediction structure of the selected views is able to adapt itself to the similarity between views. Inspired by the effectiveness of view synthesis algorithms, synthesized results are served as additional candidate references to further reduce inter-frame redundancies. Experimental results show that the proposed LF video compression framework can achieve an average of over 34% bitrate savings against state-of-the-art LF video compression methods over multiple LF video datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Synthesizing Light Field Video from Monocular Video

Geometry-aware view reconstruction network for light field image compression

Article Open access 23 December 2022

Random access prediction structures for light field video coding with MV-HEVC

Article 23 January 2020

References

Alam MM, Nguyen TD, Hagan MT, Chandler DM (2015) A perceptual quantization strategy for HEVC based on a convolutional neural network trained on natural images. In: Applications of Digital Image Processing XXXVIII, International Society for Optics and Photonics, San Diego, CA, p 959918
Avramelos V, De Praeter J, Van Wallendael G, Lambert P (2020) Random access prediction structures for light field video coding with MV-HEVC. Multimed Tools Appl 79:1–21
Article Google Scholar
Bakir N, Hamidouche W, Déforges O, Samrouth K, Khalil M (2018) Light field image compression based on convolutional neural networks and linear approximation. In: Proc. 25th IEEE international conference on image processing (ICIP). IEEE, Athens, pp 1128–1132
Bjøntegaard G (2001) Calculation of average PSNR differences between RD-curves. VCEG-M33
Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In: Proc. 8th european conference on computer vision (ECCV). Springer, Prague, pp 25–36
Chen J, Hou J, Chau LP (2017) Light field compression with disparity-guided sparse coding based on structural key views. IEEE Trans Image Process 27(1):314–324
Article MathSciNet Google Scholar
Chen Z, He T, Jin X, Wu F (2019) Learning for video compression. IEEE Transactions on Circuits and Systems for Video Technology 30 (2):566–576
Article Google Scholar
Dabala L, Ziegler M, Didyk P, Zilly F, Keinert J, Myszkowski K, Seidel HP, Rokita P, Ritschel T (2016) Efficient multi-image correspondences for on-line light field video processing. Computer Graphics Forum 35(7):401–410
Article Google Scholar
Dai F, Zhang J, Ma Y, Zhang Y (2015) Lenselet image compression scheme based on subaperture images streaming. In: Proc. IEEE international conference on image processing (ICIP). IEEE, Quebec City, pp 4733–4737
Fecker U, Kaup A (2005) H.264/AVC-compatible coding of dynamic light fields using transposed picture ordering. In: Proc. 13th european signal processing conference. IEEE, Antalya, pp 1–4
Fecker U, Kaup A (2007) Complexity evaluation of random access to coded multi-view video data. In: Proc. 15th european signal processing conference (EUSIPCO). IEEE, Poznan, pp 1–4
Guillo L, Jiang X, Lafruit G, Guillemot C (2018) Light field video dataset captured by a R8 Raytrix camera (with disparity maps)
Gul MSK, Gunturk BK (2018) Spatial and angular resolution enhancement of light fields using convolutional neural networks. IEEE Trans Image Process 27(5):2146–2159
Article MathSciNet Google Scholar
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: Evolution of optical flow estimation with deep networks. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, pp 2462–2470
Jia C, Zhang X, Wang S, Wang S, Ma S (2018) Light field image compression using generative adversarial network-based view synthesis. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9(1):177–189
Article Google Scholar
Jiang H, Sun D, Jampani V, Yang MH, Learned-Miller E, Kautz J (2018) Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Salt Lake City, pp 9000–9008
Johnson J, Alahi A, Fei-fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: Proc. 14th european conference on computer vision (ECCV). Springer, Amsterdam, pp 694–711
Kalantari NK, Wang TC, Ramamoorthi R (2016) Learning-based view synthesis for light field cameras. ACM Trans Graph 35(6):1–10
Article Google Scholar
Khoury J, Pourazad MT, Nasiopoulos P (2019) A new prediction structure for efficient MV-HEVC based light field video compression. In: Proc. International conference on computing, networking and communications (ICNC). IEEE, Honolulu, pp 588–591
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:14126980
Kovács PT, Nagy Z, Barsi A, Adhikarla VK, Bregović R (2014) Overview of the applicability of H. 264/MVC for real-time light-field applications. In: Proc. 3DTV-conference: the true vision-capture, transmission and display of 3d video (3DTV-CON). IEEE, Budapest, pp 1–4
Lafruit G, Domański M, Wegner K, Grajek T, Senoh T, Jung J, Kovács PT, Goorts P, Jorissen L, Munteanu A et al (2016) New visual coding exploration in MPEG: Super-MultiView and Free Navigation in Free viewpoint TV. Electronic Imaging 2016(5):1–9
Article Google Scholar
Levoy M, Hanrahan P (1996) Light field rendering. In: Proc. 23rd annual conference on computer graphics and interactive techniques. ACM, New Orleans, pp 31–42
Li J, Li B, Xu J, Xiong R, Gao W (2018) Fully connected network-based intra prediction for image coding. IEEE Trans Image Process 27(7):3236–3247
Article MathSciNet Google Scholar
Li L, Li Z, Li B, Liu D, Li H (2017) Pseudo-sequence-based 2-D hierarchical coding structure for light-field image compression. IEEE Journal of Selected Topics in Signal Processing 11(7):1107–1119
Article Google Scholar
Li Y, Sjöström M, Olsson R, Jennehag U (2014) Efficient intra prediction scheme for light field image compression. In: Proc. IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, Florence, pp 539–543
Liao X, Yu Y, Li B, Li Z, Qin Z (2019) A new payload partition strategy in color image steganography. IEEE Transactions on Circuits and Systems for Video Technology 30(3):685–696
Article Google Scholar
Liao X, Li K, Zhu X, Liu KR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing 14(5):955–968
Article Google Scholar
Liao X, Yin J, Chen M, Qin Z (2020) Adaptive payload distribution in multiple images steganography based on image texture features. IEEE Transactions on Dependable and Secure Computing
Liu D, Wang L, Li L, Xiong Z, Wu F, Zeng W (2016) Pseudo-sequence-based light field image compression. In: Proc. IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, Seattle, pp 1–4
Liu D, An P, Ma R, Shen L (2018) Hybrid linear weighted prediction and intra block copy based light field image coding. Multimedia Tools and Applications 77(24):31929–31951
Article Google Scholar
Liu Z, Yeh RA, Tang X, Liu Y, Agarwala A (2017) Video frame synthesis using deep voxel flow. In: Proc. IEEE international conference on computer vision (ICCV). IEEE, Venice, pp 4463–4471
Lu G, Ouyang W, Xu D, Zhang X, Cai C, Gao Z (2019) Dvc: An end-to-end deep video compression framework. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Long Beach, pp 11006–11015
Mehajabin N, Luo SR, Yu HW, Khoury J, Kaur J, Pourazad MT (2019) An efficient random access light field video compression utilizing diagonal inter-view prediction. In: Proc. IEEE international conference on image processing (ICIP). IEEE, Taipei, pp 3567–3570
Ng R, Levoy M, Brédif M, Duval G, Horowitz M, Hanrahan P, et al. (2005) Light field photography with a hand-held plenoptic camera. Computer Science Technical Report 2(11):1–11
Google Scholar
Niklaus S, Mai L, Liu F (2017) Video frame interpolation via adaptive separable convolution. In: Proc. IEEE international conference on computer vision (ICCV). IEEE, Venice, pp 261–270
Ning Y, Dong L, Li H, Li B, Feng W (2018) Convolutional neural network-based fractional-pixel motion compensation. IEEE Transactions on Circuits and Systems for Video Technology 29(3):840–853
Google Scholar
Pathak AR, Pandey M, Rautaray S (2018) Application of deep learning for object detection. Procedia computer science 132:1706–1717
Article Google Scholar
Pathak D, Girshick R, Dollár P, Darrell T, Hariharan B (2017) Learning features by watching objects move. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, pp 2701–2710
Paul M (2018) Efficient multiview video coding using 3-D coding and saliency-based bit allocation. IEEE Trans Broadcast 64(2):235–246
Article Google Scholar
Peng Y, Zhao Y, Zhang J (2018) Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Transactions on Circuits and Systems for Video Technology 29(3):773–786
Article Google Scholar
Perra C (2016) Light field image compression based on preprocessing and high efficiency coding. In: Proc. 24th telecommunications forum (TELFor). IEEE, Belgrade, pp 1–4
Perwass C, Wietzke L (2012) Single lens 3D-camera with extended depth-of-field. In: Proc. Human Vision and Electronic Imaging XVII, International Society for Optics and Photonics, Burlingame, CA, p 829108
Sabater N, Boisson G, Vandame B, Kerbiriou P, Babon F, Hog M, Gendrot R, Langlois T, Bureller O, Schubert A et al (2017) Dataset and pipeline for multi-view light-field video. In: Proc. IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, Honolulu, pp 30–40
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556
Song R, Liu D, Li H, Wu F (2017) Neural network-based arithmetic coding of intra prediction modes in HEVC. In: Proc. IEEE visual communications and image processing (VCIP). IEEE, St Petersburg, pp 1–4
Su S, Delbracio M, Wang J, Sapiro G, Heidrich W, Wang O (2017) Deep video deblurring for hand-held cameras. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, pp 1279–1288
Sullivan GJ, Ohm JR, Han WJ, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22(12):1649–1668
Article Google Scholar
Traore BB, Kamsu-Foguem B, Tangara F (2018) Deep convolution neural network for image recognition. Ecological Informatics 48:257–268
Article Google Scholar
Umer S, Mohanta PP, Rout RK, Pandey HM (2020) Machine learning method for cosmetic product recognition: a visual searching approach. Multimedia Tools and Applications 1–27
Vijayalakshmi A, et al. (2020) Deep learning approach to detect malaria from microscopic images. Multimedia Tools and Applications 79(21):15297–15317
Article Google Scholar
Wang B, Peng Q, Chen J, Gao P (2016a) A low-complexity error concealment algorithm for video transmission based on non-local means denoising. In: Proc. IEEE visual communications and image processing (VCIP). IEEE, Chengdu, pp 1–4
Wang B, Peng Q, Wang E, Han K, Xiang W (2019) Region-of-interest compression and view synthesis for light field video streaming. IEEE Access 7:41183–41192
Article Google Scholar
Wang G, Xiang W, Pickering M, Chen CW (2016) Light field multi-view video coding with two-directional parallel inter-view prediction. IEEE Trans Image Process 25(11):5104–5117
Article MathSciNet Google Scholar
Wang TC, Zhu JY, Kalantari NK, Efros AA, Ramamoorthi R (2017) Light field video capture using a learning-based hybrid imaging system. ACM Trans Graph 36(4):1–13
Article Google Scholar
Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13(7):560–576
Article Google Scholar
Wilburn B, Joshi N, Vaish V, Talvala EV, Antunez E, Barth A, Adams A, Horowitz M, Levoy M (2005) High performance imaging using large camera arrays. ACM Trans Graph 24(3):765–776
Article Google Scholar
Wu G, Zhao M, Wang L, Dai Q, Chai T, Liu Y (2017) Light field reconstruction using deep convolutional network on EPI. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, pp 6319–6327
Wu G, Liu Y, Fang L, Dai Q, Chai T (2018) Light field reconstruction using convolutional network on EPI and extended applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(7):1681–1694
Article Google Scholar
Wu G, Liu Y, Dai Q, Chai T (2019) Learning sheared EPI structure for light field reconstruction. IEEE Trans Image Process 28(7):3261–3273
Article MathSciNet Google Scholar
Yeung HWF, Hou J, Chen J, Chung YY, Chen X (2018) Fast light field reconstruction with deep coarse-to-fine modeling of spatial-angular clues. In: Proc. 15th european conference on computer vision (ECCV). Springer, Munich, pp 137–152
Zhang J, Xie Y, Wu Q, Xia Y (2019) Medical image classification using synergic deep learning. Medical image analysis 54:10–19
Article Google Scholar
Zhao S, Chen Z (2017) Light field image coding via linear approximation prior. In: Proc. IEEE international conference on image processing (ICIP). IEEE, Beijing, pp 4562–4566
Zhao S, Chen Z, Yang K, Huang H (2016) Light field image coding with hybrid scan order. In: Proc. IEEE visual communications and image processing (VCIP). IEEE, Chengdu, pp 1–4
Zhao Z, Wang S, Jia C, Zhang X, Ma S, Yang J (2018) Light field image compression based on deep learning. In: Proc. IEEE international conference on multimedia and expo (ICME). IEEE, San Diego, pp 1–6
Zhong R, Wang S, Cornelis B, Zheng Y, Yuan J, Munteanu A (2016) L1-optimized linear prediction for light field image compression. In: Proc. Picture coding symposium (PCS). IEEE, Nuremberg, pp 1–5
Zhu X, Xiong Y, Dai J, Yuan L, Wei Y (2017) Deep feature flow for video recognition. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, pp 2349–2358

Download references

Acknowledgements

The work of Bing Wang was partially supported by the China Scholarship Council (CSC) under Grant 201707000093.

Author information

Authors and Affiliations

College of Science and Engineering, James Cook University, Cairns, QLD 4878, Australia
Bing Wang & Eric Wang
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China
Bing Wang, Qiang Peng & Xiao Wu
School of Engineering and Mathematical Science, La Trobe University, Melbourne, VIC 3686, Australia
Wei Xiang
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Pan Gao

Authors

Bing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Eric Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Peng
View author publications
You can also search for this author in PubMed Google Scholar
Pan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bing Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, B., Xiang, W., Wang, E. et al. Learning-based high-efficiency compression framework for light field videos. Multimed Tools Appl 81, 7527–7560 (2022). https://doi.org/10.1007/s11042-022-11955-8

Download citation

Received: 29 June 2021
Revised: 10 September 2021
Accepted: 03 January 2022
Published: 28 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11042-022-11955-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning-based high-efficiency compression framework for light field videos

Abstract

Access this article

Similar content being viewed by others

Synthesizing Light Field Video from Monocular Video

Geometry-aware view reconstruction network for light field image compression

Random access prediction structures for light field video coding with MV-HEVC

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation