An End-to-End Mutual Enhancement Network Toward Image Compression and Semantic Segmentation

Chen, Junru; Yao, Chao; Liu, Meiqin; Zhao, Yao

doi:10.1007/978-3-030-88007-1_51

Junru Chen^16,17,
Chao Yao¹⁸,
Meiqin Liu^16,17 &
…
Yao Zhao^16,17

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13020))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2180 Accesses

Abstract

Image compression is to compress image data without compromising human vision feeling. However, the information loss through the image compression process may influence the following machine vision tasks, such as object detection and semantic segmentation. How to jointly consider the human vision and the machine vision to compress images for human and machine vision tasks is still an open problem. In this paper, we provide a multi-task framework for image compression and semantic segmentation. More specifically, an end-to-end mutual enhancement network is designed to efficiently compress the given image, and simultaneously segment the semantic information. Firstly, a uniform feature learning strategy is adopted to jointly learn the features for image compression and semantic segmentation in the encoder. Moreover, a multi-scale aggregation module in the encoder is employed to enhance the semantic features. Then, by transmitting the quantified features, both the decompressed image features and the learned semantic features can be reconstructed. Finally, we decode this information for the image compression task and the semantic segmentation task. On one hand, we can utilize the decompressed semantic features to implement semantic segmentation in the decoder. On the other hand, the quality of the decompressed image can be further improved depending on the obtained semantic segmentation map. Experimental results prove that our framework is effective to simultaneously support image compression and semantic segmentation, both in the subjective and objective evaluation.

This work was supported by National Natural Science Foundation of China (61972028, 61902022) and the Fundamental Research Funds for the Central Universities (2019JBM018, FRF-TP-19-015A1).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: 5th International Conference on Learning Representations, ICLR 2017 (2017)
Google Scholar
Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)
Google Scholar
Lee, J., Cho, S., Beack, S.K.: Context-adaptive entropy model for end-to-end optimized image compression. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)
Google Scholar
Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7939–7948 (2020)
Google Scholar
Duan, L., Liu, J., Yang, W., Huang, T., Gao, W.: Video coding for machines: a paradigm of collaborative compression and intelligent analytics. IEEE Trans. Image Process. 29, 8680–8695 (2020)
Article Google Scholar
Liu, D., Li, Y., Lin, J., Li, H., Wu, F.: Deep learning-based video coding: a review and a case study. ACM Comput. Surv. (CSUR) 53(1), 1–35 (2020)
Article Google Scholar
Lin, W., et al.: Partition-aware adaptive switching neural networks for post-processing in HEVC. IEEE Trans. Multimed. 22(11), 2749–2763 (2019)
Google Scholar
Cui, W., et al.: Convolutional neural networks based intra prediction for HEVC. In: 2017 Data Compression Conference (DCC), pp. 436–436. IEEE Computer Society (2017)
Google Scholar
Mao, J., Yu, L.: Convolutional neural network based bi-prediction utilizing spatial and temporal information in video coding. IEEE Trans. Circ. Syst. Video Technol. 30(7), 1856–1870 (2019)
Google Scholar
Song, R., Liu, D., Li, H., Wu, F.: Neural network-based arithmetic coding of intra prediction modes in HEVC. In: Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2017)
Google Scholar
Liu, D., Ma, H., Xiong, Z., Wu, F.: CNN-based DCT-like transform for image compression. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 61–72. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73600-6_6
Chapter Google Scholar
Alam, M.M., Nguyen, T.D., Hagan, M.T., Chandler, D.M.: A perceptual quantization strategy for HEVC based on a convolutional neural network trained on natural images. In: Applications of Digital Image Processing, vol. 9599, p. 959918. International Society for Optics and Photonics (2015)
Google Scholar
Bross, B., Chen, J., Ohm, J.R., Sullivan, G.J., Wang, Y.K.: Developments in international video coding standardization after AVC, with an overview of versatile video coding (VVC). In: Proceedings of the IEEE (2021)
Google Scholar
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Video Technol. 22(12), 1649–1668 (2012)
Article Google Scholar
Hou, D., Zhao, Y., Ye, Y., Yang, J., Zhang, J., Wang, R.: Super-resolving compressed video in coding chain. arXiv preprint arXiv:2103.14247 (2021)
Ho, M.M., Zhou, J., He, G.: RR-DnCNN v2.0: enhanced restoration-reconstruction deep neural network for down-sampling-based video coding. IEEE Trans. Image Process. 30, 1702–1715 (2021)
Article Google Scholar
Akbari, M., Liang, J., Han, J.: DSSLIC: deep semantic segmentation-based layered image compression. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2042–2046. IEEE (2019)
Google Scholar
Sun, S., He, T., Chen, Z.: Semantic structured image coding framework for multiple intelligent applications. IEEE Trans. Circ. Syst. Video Technol. 31(9), 3631–3642 (2020)
Article Google Scholar
Hoang, T.M., Zhou, J., Fan, Y.: Image compression with encoder-decoder matched semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 160–161 (2020)
Google Scholar
Ballé, J., Laparra, V., Simoncelli, E.P.: Density modeling of images using a generalized normalization transformation. In: 4th International Conference on Learning Representations, ICLR 2016 (2016)
Google Scholar
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)
Article Google Scholar
Kodak, E.: Kodak lossless true color image suite (PhotoCD PCD0992), vol. 6. http://r0k.us/graphics/kodak (1993)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Wallace, G.K.: The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38(1), 18–34 (1992)
Article Google Scholar
Skodras, A., Christopoulos, C., Ebrahimi, T.: The JPEG 2000 still image compression standard. IEEE Signal Process. Mag. 18(5), 36–58 (2001)
Article Google Scholar
Bellard, F.: Better portable graphics. https://www.bellard.org/bpg (2014)
Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks with identity mappings for high-resolution semantic segmentation. arXiv preprint arXiv:1611.06612
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016)
Google Scholar
Krešo, I., Čaušević, D., Krapac, J., Šegvić, S.: Convolutional scale invariance for semantic segmentation. In: Rosenhahn, B., Andres, B. (eds.) GCPR 2016. LNCS, vol. 9796, pp. 64–75. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45886-1_6
Chapter Google Scholar
Ghiasi, G., Fowlkes, C.C.: Laplacian reconstruction and refinement for semantic segmentation. arXiv preprint arXiv:1605.02264, vol. 4(4) (2016)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Science, Beijing Jiaotong University, Beijing, 100044, China
Junru Chen, Meiqin Liu & Yao Zhao
Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing Jiaotong University, Beijing, 100044, China
Junru Chen, Meiqin Liu & Yao Zhao
School of Computer and Communication Engineering, University of Science and Technology Beijing, 100083, Beijing, China
Chao Yao

Authors

Junru Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chao Yao
View author publications
You can also search for this author in PubMed Google Scholar
Meiqin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yao Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yao Zhao .

Editor information

Editors and Affiliations

University of Science and Technology Beijing, Beijing, China
Huimin Ma
Chinese Academy of Sciences, Beijing, China
Liang Wang
Tsinghua University, Beijing, China
Changshui Zhang
Zhejiang University, Hangzhou, China
Fei Wu
Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hunan University, Changsha, China
Yaonan Wang
Sun Yat-Sen University, Guangzhou, Guangdong, China
Jianhuang Lai
Beijing Jiaotong University, Beijing, China
Yao Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Yao, C., Liu, M., Zhao, Y. (2021). An End-to-End Mutual Enhancement Network Toward Image Compression and Semantic Segmentation. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13020. Springer, Cham. https://doi.org/10.1007/978-3-030-88007-1_51

Download citation

DOI: https://doi.org/10.1007/978-3-030-88007-1_51
Published: 22 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88006-4
Online ISBN: 978-3-030-88007-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics