Skip to main content

An End-to-End Mutual Enhancement Network Toward Image Compression and Semantic Segmentation

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13020))

Included in the following conference series:

  • 2180 Accesses

Abstract

Image compression is to compress image data without compromising human vision feeling. However, the information loss through the image compression process may influence the following machine vision tasks, such as object detection and semantic segmentation. How to jointly consider the human vision and the machine vision to compress images for human and machine vision tasks is still an open problem. In this paper, we provide a multi-task framework for image compression and semantic segmentation. More specifically, an end-to-end mutual enhancement network is designed to efficiently compress the given image, and simultaneously segment the semantic information. Firstly, a uniform feature learning strategy is adopted to jointly learn the features for image compression and semantic segmentation in the encoder. Moreover, a multi-scale aggregation module in the encoder is employed to enhance the semantic features. Then, by transmitting the quantified features, both the decompressed image features and the learned semantic features can be reconstructed. Finally, we decode this information for the image compression task and the semantic segmentation task. On one hand, we can utilize the decompressed semantic features to implement semantic segmentation in the decoder. On the other hand, the quality of the decompressed image can be further improved depending on the obtained semantic segmentation map. Experimental results prove that our framework is effective to simultaneously support image compression and semantic segmentation, both in the subjective and objective evaluation.

This work was supported by National Natural Science Foundation of China (61972028, 61902022) and the Fundamental Research Funds for the Central Universities (2019JBM018, FRF-TP-19-015A1).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: 5th International Conference on Learning Representations, ICLR 2017 (2017)

    Google Scholar 

  2. Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)

    Google Scholar 

  3. Lee, J., Cho, S., Beack, S.K.: Context-adaptive entropy model for end-to-end optimized image compression. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)

    Google Scholar 

  4. Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7939–7948 (2020)

    Google Scholar 

  5. Duan, L., Liu, J., Yang, W., Huang, T., Gao, W.: Video coding for machines: a paradigm of collaborative compression and intelligent analytics. IEEE Trans. Image Process. 29, 8680–8695 (2020)

    Article  Google Scholar 

  6. Liu, D., Li, Y., Lin, J., Li, H., Wu, F.: Deep learning-based video coding: a review and a case study. ACM Comput. Surv. (CSUR) 53(1), 1–35 (2020)

    Article  Google Scholar 

  7. Lin, W., et al.: Partition-aware adaptive switching neural networks for post-processing in HEVC. IEEE Trans. Multimed. 22(11), 2749–2763 (2019)

    Google Scholar 

  8. Cui, W., et al.: Convolutional neural networks based intra prediction for HEVC. In: 2017 Data Compression Conference (DCC), pp. 436–436. IEEE Computer Society (2017)

    Google Scholar 

  9. Mao, J., Yu, L.: Convolutional neural network based bi-prediction utilizing spatial and temporal information in video coding. IEEE Trans. Circ. Syst. Video Technol. 30(7), 1856–1870 (2019)

    Google Scholar 

  10. Song, R., Liu, D., Li, H., Wu, F.: Neural network-based arithmetic coding of intra prediction modes in HEVC. In: Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2017)

    Google Scholar 

  11. Liu, D., Ma, H., Xiong, Z., Wu, F.: CNN-based DCT-like transform for image compression. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 61–72. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73600-6_6

    Chapter  Google Scholar 

  12. Alam, M.M., Nguyen, T.D., Hagan, M.T., Chandler, D.M.: A perceptual quantization strategy for HEVC based on a convolutional neural network trained on natural images. In: Applications of Digital Image Processing, vol. 9599, p. 959918. International Society for Optics and Photonics (2015)

    Google Scholar 

  13. Bross, B., Chen, J., Ohm, J.R., Sullivan, G.J., Wang, Y.K.: Developments in international video coding standardization after AVC, with an overview of versatile video coding (VVC). In: Proceedings of the IEEE (2021)

    Google Scholar 

  14. Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Video Technol. 22(12), 1649–1668 (2012)

    Article  Google Scholar 

  15. Hou, D., Zhao, Y., Ye, Y., Yang, J., Zhang, J., Wang, R.: Super-resolving compressed video in coding chain. arXiv preprint arXiv:2103.14247 (2021)

  16. Ho, M.M., Zhou, J., He, G.: RR-DnCNN v2.0: enhanced restoration-reconstruction deep neural network for down-sampling-based video coding. IEEE Trans. Image Process. 30, 1702–1715 (2021)

    Article  Google Scholar 

  17. Akbari, M., Liang, J., Han, J.: DSSLIC: deep semantic segmentation-based layered image compression. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2042–2046. IEEE (2019)

    Google Scholar 

  18. Sun, S., He, T., Chen, Z.: Semantic structured image coding framework for multiple intelligent applications. IEEE Trans. Circ. Syst. Video Technol. 31(9), 3631–3642 (2020)

    Article  Google Scholar 

  19. Hoang, T.M., Zhou, J., Fan, Y.: Image compression with encoder-decoder matched semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 160–161 (2020)

    Google Scholar 

  20. Ballé, J., Laparra, V., Simoncelli, E.P.: Density modeling of images using a generalized normalization transformation. In: 4th International Conference on Learning Representations, ICLR 2016 (2016)

    Google Scholar 

  21. Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)

    Article  Google Scholar 

  22. Kodak, E.: Kodak lossless true color image suite (PhotoCD PCD0992), vol. 6. http://r0k.us/graphics/kodak (1993)

  23. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  24. Wallace, G.K.: The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38(1), 18–34 (1992)

    Article  Google Scholar 

  25. Skodras, A., Christopoulos, C., Ebrahimi, T.: The JPEG 2000 still image compression standard. IEEE Signal Process. Mag. 18(5), 36–58 (2001)

    Article  Google Scholar 

  26. Bellard, F.: Better portable graphics. https://www.bellard.org/bpg (2014)

  27. Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks with identity mappings for high-resolution semantic segmentation. arXiv preprint arXiv:1611.06612

  28. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)

  29. Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016)

    Google Scholar 

  30. Krešo, I., Čaušević, D., Krapac, J., Šegvić, S.: Convolutional scale invariance for semantic segmentation. In: Rosenhahn, B., Andres, B. (eds.) GCPR 2016. LNCS, vol. 9796, pp. 64–75. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45886-1_6

    Chapter  Google Scholar 

  31. Ghiasi, G., Fowlkes, C.C.: Laplacian reconstruction and refinement for semantic segmentation. arXiv preprint arXiv:1605.02264, vol. 4(4) (2016)

  32. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yao Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, J., Yao, C., Liu, M., Zhao, Y. (2021). An End-to-End Mutual Enhancement Network Toward Image Compression and Semantic Segmentation. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13020. Springer, Cham. https://doi.org/10.1007/978-3-030-88007-1_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88007-1_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88006-4

  • Online ISBN: 978-3-030-88007-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics