Skip to main content
Log in

Learning-based high-efficiency compression framework for light field videos

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The massive amount of data usage for light field (LF) information poses grand challenges for efficient compression designs. There have been several LF video compression methods focusing on exploring efficient prediction structures reported in the literature. However, the number of possible prediction structures is infinite, and these methods fail to fully exploit the intrinsic geometry between views of an LF video. In this paper, we propose a deep learning-based high-efficiency LF video compression framework by exploiting the inherent geometrical structure of LF videos. The proposed framework is composed of several crucial components, namely sparse coding based on a universal view sampling method (UVSM) and a CNN-based LF view synthesis algorithm (LF-CNN), a high-efficiency adaptive prediction structure (APS), and a synthesized candidate reference (SCR)-based inter-frame prediction strategy. Specifically, instead of encoding all the views in an LF video, only parts of views are compressed while the remaining views are reconstructed from the encoded views with LF-CNN. The prediction structure of the selected views is able to adapt itself to the similarity between views. Inspired by the effectiveness of view synthesis algorithms, synthesized results are served as additional candidate references to further reduce inter-frame redundancies. Experimental results show that the proposed LF video compression framework can achieve an average of over 34% bitrate savings against state-of-the-art LF video compression methods over multiple LF video datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Alam MM, Nguyen TD, Hagan MT, Chandler DM (2015) A perceptual quantization strategy for HEVC based on a convolutional neural network trained on natural images. In: Applications of Digital Image Processing XXXVIII, International Society for Optics and Photonics, San Diego, CA, p 959918

  2. Avramelos V, De Praeter J, Van Wallendael G, Lambert P (2020) Random access prediction structures for light field video coding with MV-HEVC. Multimed Tools Appl 79:1–21

    Article  Google Scholar 

  3. Bakir N, Hamidouche W, Déforges O, Samrouth K, Khalil M (2018) Light field image compression based on convolutional neural networks and linear approximation. In: Proc. 25th IEEE international conference on image processing (ICIP). IEEE, Athens, pp 1128–1132

  4. Bjøntegaard G (2001) Calculation of average PSNR differences between RD-curves. VCEG-M33

  5. Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In: Proc. 8th european conference on computer vision (ECCV). Springer, Prague, pp 25–36

  6. Chen J, Hou J, Chau LP (2017) Light field compression with disparity-guided sparse coding based on structural key views. IEEE Trans Image Process 27(1):314–324

    Article  MathSciNet  Google Scholar 

  7. Chen Z, He T, Jin X, Wu F (2019) Learning for video compression. IEEE Transactions on Circuits and Systems for Video Technology 30 (2):566–576

    Article  Google Scholar 

  8. Dabala L, Ziegler M, Didyk P, Zilly F, Keinert J, Myszkowski K, Seidel HP, Rokita P, Ritschel T (2016) Efficient multi-image correspondences for on-line light field video processing. Computer Graphics Forum 35(7):401–410

    Article  Google Scholar 

  9. Dai F, Zhang J, Ma Y, Zhang Y (2015) Lenselet image compression scheme based on subaperture images streaming. In: Proc. IEEE international conference on image processing (ICIP). IEEE, Quebec City, pp 4733–4737

  10. Fecker U, Kaup A (2005) H.264/AVC-compatible coding of dynamic light fields using transposed picture ordering. In: Proc. 13th european signal processing conference. IEEE, Antalya, pp 1–4

  11. Fecker U, Kaup A (2007) Complexity evaluation of random access to coded multi-view video data. In: Proc. 15th european signal processing conference (EUSIPCO). IEEE, Poznan, pp 1–4

  12. Guillo L, Jiang X, Lafruit G, Guillemot C (2018) Light field video dataset captured by a R8 Raytrix camera (with disparity maps)

  13. Gul MSK, Gunturk BK (2018) Spatial and angular resolution enhancement of light fields using convolutional neural networks. IEEE Trans Image Process 27(5):2146–2159

    Article  MathSciNet  Google Scholar 

  14. Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: Evolution of optical flow estimation with deep networks. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, pp 2462–2470

  15. Jia C, Zhang X, Wang S, Wang S, Ma S (2018) Light field image compression using generative adversarial network-based view synthesis. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9(1):177–189

    Article  Google Scholar 

  16. Jiang H, Sun D, Jampani V, Yang MH, Learned-Miller E, Kautz J (2018) Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Salt Lake City, pp 9000–9008

  17. Johnson J, Alahi A, Fei-fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: Proc. 14th european conference on computer vision (ECCV). Springer, Amsterdam, pp 694–711

  18. Kalantari NK, Wang TC, Ramamoorthi R (2016) Learning-based view synthesis for light field cameras. ACM Trans Graph 35(6):1–10

    Article  Google Scholar 

  19. Khoury J, Pourazad MT, Nasiopoulos P (2019) A new prediction structure for efficient MV-HEVC based light field video compression. In: Proc. International conference on computing, networking and communications (ICNC). IEEE, Honolulu, pp 588–591

  20. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv:14126980

  21. Kovács PT, Nagy Z, Barsi A, Adhikarla VK, Bregović R (2014) Overview of the applicability of H. 264/MVC for real-time light-field applications. In: Proc. 3DTV-conference: the true vision-capture, transmission and display of 3d video (3DTV-CON). IEEE, Budapest, pp 1–4

  22. Lafruit G, Domański M, Wegner K, Grajek T, Senoh T, Jung J, Kovács PT, Goorts P, Jorissen L, Munteanu A et al (2016) New visual coding exploration in MPEG: Super-MultiView and Free Navigation in Free viewpoint TV. Electronic Imaging 2016(5):1–9

    Article  Google Scholar 

  23. Levoy M, Hanrahan P (1996) Light field rendering. In: Proc. 23rd annual conference on computer graphics and interactive techniques. ACM, New Orleans, pp 31–42

  24. Li J, Li B, Xu J, Xiong R, Gao W (2018) Fully connected network-based intra prediction for image coding. IEEE Trans Image Process 27(7):3236–3247

    Article  MathSciNet  Google Scholar 

  25. Li L, Li Z, Li B, Liu D, Li H (2017) Pseudo-sequence-based 2-D hierarchical coding structure for light-field image compression. IEEE Journal of Selected Topics in Signal Processing 11(7):1107–1119

    Article  Google Scholar 

  26. Li Y, Sjöström M, Olsson R, Jennehag U (2014) Efficient intra prediction scheme for light field image compression. In: Proc. IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, Florence, pp 539–543

  27. Liao X, Yu Y, Li B, Li Z, Qin Z (2019) A new payload partition strategy in color image steganography. IEEE Transactions on Circuits and Systems for Video Technology 30(3):685–696

    Article  Google Scholar 

  28. Liao X, Li K, Zhu X, Liu KR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing 14(5):955–968

    Article  Google Scholar 

  29. Liao X, Yin J, Chen M, Qin Z (2020) Adaptive payload distribution in multiple images steganography based on image texture features. IEEE Transactions on Dependable and Secure Computing

  30. Liu D, Wang L, Li L, Xiong Z, Wu F, Zeng W (2016) Pseudo-sequence-based light field image compression. In: Proc. IEEE international conference on multimedia and expo workshops (ICMEW). IEEE, Seattle, pp 1–4

  31. Liu D, An P, Ma R, Shen L (2018) Hybrid linear weighted prediction and intra block copy based light field image coding. Multimedia Tools and Applications 77(24):31929–31951

    Article  Google Scholar 

  32. Liu Z, Yeh RA, Tang X, Liu Y, Agarwala A (2017) Video frame synthesis using deep voxel flow. In: Proc. IEEE international conference on computer vision (ICCV). IEEE, Venice, pp 4463–4471

  33. Lu G, Ouyang W, Xu D, Zhang X, Cai C, Gao Z (2019) Dvc: An end-to-end deep video compression framework. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Long Beach, pp 11006–11015

  34. Mehajabin N, Luo SR, Yu HW, Khoury J, Kaur J, Pourazad MT (2019) An efficient random access light field video compression utilizing diagonal inter-view prediction. In: Proc. IEEE international conference on image processing (ICIP). IEEE, Taipei, pp 3567–3570

  35. Ng R, Levoy M, Brédif M, Duval G, Horowitz M, Hanrahan P, et al. (2005) Light field photography with a hand-held plenoptic camera. Computer Science Technical Report 2(11):1–11

    Google Scholar 

  36. Niklaus S, Mai L, Liu F (2017) Video frame interpolation via adaptive separable convolution. In: Proc. IEEE international conference on computer vision (ICCV). IEEE, Venice, pp 261–270

  37. Ning Y, Dong L, Li H, Li B, Feng W (2018) Convolutional neural network-based fractional-pixel motion compensation. IEEE Transactions on Circuits and Systems for Video Technology 29(3):840–853

    Google Scholar 

  38. Pathak AR, Pandey M, Rautaray S (2018) Application of deep learning for object detection. Procedia computer science 132:1706–1717

    Article  Google Scholar 

  39. Pathak D, Girshick R, Dollár P, Darrell T, Hariharan B (2017) Learning features by watching objects move. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, pp 2701–2710

  40. Paul M (2018) Efficient multiview video coding using 3-D coding and saliency-based bit allocation. IEEE Trans Broadcast 64(2):235–246

    Article  Google Scholar 

  41. Peng Y, Zhao Y, Zhang J (2018) Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Transactions on Circuits and Systems for Video Technology 29(3):773–786

    Article  Google Scholar 

  42. Perra C (2016) Light field image compression based on preprocessing and high efficiency coding. In: Proc. 24th telecommunications forum (TELFor). IEEE, Belgrade, pp 1–4

  43. Perwass C, Wietzke L (2012) Single lens 3D-camera with extended depth-of-field. In: Proc. Human Vision and Electronic Imaging XVII, International Society for Optics and Photonics, Burlingame, CA, p 829108

  44. Sabater N, Boisson G, Vandame B, Kerbiriou P, Babon F, Hog M, Gendrot R, Langlois T, Bureller O, Schubert A et al (2017) Dataset and pipeline for multi-view light-field video. In: Proc. IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, Honolulu, pp 30–40

  45. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556

  46. Song R, Liu D, Li H, Wu F (2017) Neural network-based arithmetic coding of intra prediction modes in HEVC. In: Proc. IEEE visual communications and image processing (VCIP). IEEE, St Petersburg, pp 1–4

  47. Su S, Delbracio M, Wang J, Sapiro G, Heidrich W, Wang O (2017) Deep video deblurring for hand-held cameras. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, pp 1279–1288

  48. Sullivan GJ, Ohm JR, Han WJ, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22(12):1649–1668

    Article  Google Scholar 

  49. Traore BB, Kamsu-Foguem B, Tangara F (2018) Deep convolution neural network for image recognition. Ecological Informatics 48:257–268

    Article  Google Scholar 

  50. Umer S, Mohanta PP, Rout RK, Pandey HM (2020) Machine learning method for cosmetic product recognition: a visual searching approach. Multimedia Tools and Applications 1–27

  51. Vijayalakshmi A, et al. (2020) Deep learning approach to detect malaria from microscopic images. Multimedia Tools and Applications 79(21):15297–15317

    Article  Google Scholar 

  52. Wang B, Peng Q, Chen J, Gao P (2016a) A low-complexity error concealment algorithm for video transmission based on non-local means denoising. In: Proc. IEEE visual communications and image processing (VCIP). IEEE, Chengdu, pp 1–4

  53. Wang B, Peng Q, Wang E, Han K, Xiang W (2019) Region-of-interest compression and view synthesis for light field video streaming. IEEE Access 7:41183–41192

    Article  Google Scholar 

  54. Wang G, Xiang W, Pickering M, Chen CW (2016) Light field multi-view video coding with two-directional parallel inter-view prediction. IEEE Trans Image Process 25(11):5104–5117

    Article  MathSciNet  Google Scholar 

  55. Wang TC, Zhu JY, Kalantari NK, Efros AA, Ramamoorthi R (2017) Light field video capture using a learning-based hybrid imaging system. ACM Trans Graph 36(4):1–13

    Article  Google Scholar 

  56. Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13(7):560–576

    Article  Google Scholar 

  57. Wilburn B, Joshi N, Vaish V, Talvala EV, Antunez E, Barth A, Adams A, Horowitz M, Levoy M (2005) High performance imaging using large camera arrays. ACM Trans Graph 24(3):765–776

    Article  Google Scholar 

  58. Wu G, Zhao M, Wang L, Dai Q, Chai T, Liu Y (2017) Light field reconstruction using deep convolutional network on EPI. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, pp 6319–6327

  59. Wu G, Liu Y, Fang L, Dai Q, Chai T (2018) Light field reconstruction using convolutional network on EPI and extended applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(7):1681–1694

    Article  Google Scholar 

  60. Wu G, Liu Y, Dai Q, Chai T (2019) Learning sheared EPI structure for light field reconstruction. IEEE Trans Image Process 28(7):3261–3273

    Article  MathSciNet  Google Scholar 

  61. Yeung HWF, Hou J, Chen J, Chung YY, Chen X (2018) Fast light field reconstruction with deep coarse-to-fine modeling of spatial-angular clues. In: Proc. 15th european conference on computer vision (ECCV). Springer, Munich, pp 137–152

  62. Zhang J, Xie Y, Wu Q, Xia Y (2019) Medical image classification using synergic deep learning. Medical image analysis 54:10–19

    Article  Google Scholar 

  63. Zhao S, Chen Z (2017) Light field image coding via linear approximation prior. In: Proc. IEEE international conference on image processing (ICIP). IEEE, Beijing, pp 4562–4566

  64. Zhao S, Chen Z, Yang K, Huang H (2016) Light field image coding with hybrid scan order. In: Proc. IEEE visual communications and image processing (VCIP). IEEE, Chengdu, pp 1–4

  65. Zhao Z, Wang S, Jia C, Zhang X, Ma S, Yang J (2018) Light field image compression based on deep learning. In: Proc. IEEE international conference on multimedia and expo (ICME). IEEE, San Diego, pp 1–6

  66. Zhong R, Wang S, Cornelis B, Zheng Y, Yuan J, Munteanu A (2016) L1-optimized linear prediction for light field image compression. In: Proc. Picture coding symposium (PCS). IEEE, Nuremberg, pp 1–5

  67. Zhu X, Xiong Y, Dai J, Yuan L, Wei Y (2017) Deep feature flow for video recognition. In: Proc. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, Honolulu, pp 2349–2358

Download references

Acknowledgements

The work of Bing Wang was partially supported by the China Scholarship Council (CSC) under Grant 201707000093.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Xiang, W., Wang, E. et al. Learning-based high-efficiency compression framework for light field videos. Multimed Tools Appl 81, 7527–7560 (2022). https://doi.org/10.1007/s11042-022-11955-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-11955-8

Keywords

Navigation