Progressive decomposition: a method of coarse-to-fine image parsing using stacked networks

Sun, Yunhan; Hu, Jiagao; Shi, Jinlong; Sun, Zhengxing

doi:10.1007/s11042-019-08288-4

Progressive decomposition: a method of coarse-to-fine image parsing using stacked networks

Published: 29 January 2020

Volume 79, pages 13379–13402, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yunhan Sun¹,
Jiagao Hu²,
Jinlong Shi¹ &
…
Zhengxing Sun ORCID: orcid.org/0000-0001-7137-6169²

188 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

To parse images into fine-grained semantic parts, the complex elements will put it in trouble when using off-the-shelf semantic segmentation networks, because it is difficult for them to utilize the contextual information of fine-grained parts. In this paper we propose a progressive decomposition method to parse images in a coarse-to-fine manner with refined semantic classes. It consists of two aspects: stacked networks and progressive supervisions. The stacked network is achieved by stacking some segmentation layers in a segmentation network. The former segmentation module parses images at a coarser-grained level, and the result will be fed to the following one to provide effective contextual clues for the finer-grained parsing. The skip connections from shallow layers of the network to fine-grained parsing modules are also added to recover the details of small structures. For the training of the stacked networks which have coarse-to-fine outputs, a strategy of progressive supervision is proposed to merge classes in ground truth to get coarse-to-fine label maps, and then train the stacked network end-to-end with the hierarchical supervisions. The proposed framework can be injected into many advanced neural networks to improve the parsing results. Extensive evaluations on several public datasets including face parsing and human parsing well demonstrate the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Learning deep representations for semantic image parsing: a comprehensive overview

Article 30 August 2018

Contrastive and Consistent Learning for Unsupervised Human Parsing

Improved Efficiency of Semantic Segmentation using Pyramid Scene Parsing Deep Learning Network Method

References

Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 39:2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615
Article Google Scholar
Chen L-C, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: 2016 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3640–3649. https://doi.org/10.1109/CVPR.2016.396
Chen L-C, Papandreou G, Kokkinos I et al (2018) DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40:834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Article Google Scholar
Eigen D, Fergus R (2015) Predicting depth, surface Normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE Int. Conf. Comput. Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 2650–2658. https://doi.org/10.1109/ICCV.2015.304
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: 2017 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 4476–4484. https://doi.org/10.1109/CVPR.2017.476
Fu J, Liu J, Wang Y, Lu H (2017) Densely connected deconvolutional network for semantic segmentation. In: 2017 IEEE Int. Conf. Image process. ICIP 2017, Beijing, China, Sept. 17–20, 2017. IEEE, pp 3085–3089. https://doi.org/10.1109/ICIP.2017.8296850
Gong K, Liang X, Zhang D, Shen X, Lin L (2017) Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: 2017 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 6757–6765. https://doi.org/10.1109/CVPR.2017.715
Hariharan B, Arbeláez PA, Girshick RB, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015. IEEE Computer Society, pp 447–456. https://doi.org/10.1109/CVPR.2015.7298642
Hu J, Sun Z, Sun Y, Shi J (2018) Progressive refinement: a method of coarse-to-fine image parsing using stacked network. In: 2018 IEEE Int. Conf. Multimed. Expo, ICME 2018, San Diego, USA, July 23–27, 2018, pp 1–6
Jegou S, Drozdzal M, Vazquez D et al (2017) The one hundred layers tiramisu: fully convolutional DenseNets for semantic segmentation. In: 2017 IEEE Conf. Comput. Vis. Pattern Recognit. Work. IEEE, pp 1175–1183. https://doi.org/10.1109/CVPRW.2017.156
Krähenbühl P, Koltun V (2011) Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ (eds) Adv. Neural Inf. Process. Syst. 24, 25th Annu. Conf. Neural Inf. Process. Syst. 2011. Proc. a meet. Held 12–14 December 2011, Granada, Spain, pp 109–117 http://papers.nips.cc/paper/4296-efficient-inference-in-fully-connected-crfs-with-gaussian-edge-potentials
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Bartlett PL, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ (eds) Adv. Neural Inf. Process. Syst. 25 26th Annu. Conf. Neural Inf. Process. Syst. 2012. Proc. a meet. Held December 3–6, 2012, Lake Tahoe, Nevada, United States, pp 1106–1114 http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
Li Z, Zhang J (2017) Pixel-level guided face editing with fully convolution networks. In: 2017 IEEE Int. Conf. Multimed. Expo. IEEE, pp 307–312. https://doi.org/10.1109/ICME.2017.8019363
Liang X, Liu S, Shen X et al (2015) Deep human parsing with active template regression. IEEE Trans Pattern Anal Mach Intell 37:2402–2414. https://doi.org/10.1109/TPAMI.2015.2408360
Article Google Scholar
Liang X, Xu C, Shen X, Yang J, Liu S, Tang J, Lin L, Yan S (2015) Human parsing with contextualized convolutional neural network. In: 2015 IEEE Int. Conf. Comput. Vis. IEEE, pp 1386–1394. https://doi.org/10.1109/ICCV.2015.163
Liang X, Lin L, Yang W et al (2016) Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Trans Multimed 18:1175–1186. https://doi.org/10.1109/TMM.2016.2542983
Article Google Scholar
Liang X, Shen X, Xiang D, Feng J, Lin L, Yan S (2016) Semantic object parsing with local-global Long short-term memory. In: 2016 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3185–3193. https://doi.org/10.1109/CVPR.2016.347
Lin G, Shen C, van den Hengel A, Reid ID (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: 2016 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016. IEEE Computer Society, pp 3194–3203. https://doi.org/10.1109/CVPR.2016.348
Lin G, Milan A, Shen C, Reid ID (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: 2017 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 5168–5177. https://doi.org/10.1109/CVPR.2017.549
Liu S, Liang X, Liu L, Shen X, Yang J, Xu C, Lin L, Xiaochun C, Yan S (2015) Matching-CNN meets KNN: quasi-parametric human parsing. In: 2015 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 1419–1427. https://doi.org/10.1109/CVPR.2015.7298748
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
Luo P, Wang X, Tang X (2012) Hierarchical face parsing via deep learning. In: 2012 IEEE Conf. Comput. Vis. Pattern recognition, Provid. RI, USA, June 16–21, 2012. IEEE Computer Society, pp 2480–2487. https://doi.org/10.1109/CVPR.2012.6247963
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: 2015 IEEE Int. Conf. Comput. Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 1520–1528. https://doi.org/10.1109/ICCV.2015.178
Porway J, Wang Q, Zhu SC (2010) A hierarchical and contextual model for aerial image parsing. Int J Comput Vis 88:254–283. https://doi.org/10.1007/s11263-009-0306-1
Article MathSciNet Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (Eds.) Med. Image Comput. Comput. Interv. - MICCAI 2015 - 18th Int. Conf. Munich, Ger. Oct. 5–9, 2015, proceedings, part III. Springer, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Google Scholar
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39:640–651. https://doi.org/10.1109/TPAMI.2016.2572683
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, CoRR. abs/1409.1556. http://arxiv.org/abs/1409.1556
Smith BM, Zhang L, Brandt J, Lin Z, Yang J (2013) Exemplar-based face parsing. In: 2013 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3484–3491. https://doi.org/10.1109/CVPR.2013.447
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015. IEEE Computer Society, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Tu Z, Chen X, Yuille AL, Zhu SC (2005) Image parsing: unifying segmentation, detection, and recognition. Int J Comput Vis 63:113–140. https://doi.org/10.1007/s11263-005-6642-x
Article Google Scholar
Wang T, Borji A, Zhang L, Zhang P, Lu H (2017) A stagewise refinement model for detecting salient objects in images. In: 2017 IEEE Int. Conf. Comput. Vis. IEEE, pp 4039–4048. https://doi.org/10.1109/ICCV.2017.433
Xu Z, Chen H, Zhu SC, Luo J (2008) A hierarchical compositional model for face representation and sketching. IEEE Trans Pattern Anal Mach Intell 30:955–969. https://doi.org/10.1109/TPAMI.2008.50
Article Google Scholar
Yang K, Sun Z (2017) Paint with stitches: a style definition and image-based rendering method for random-needle embroidery. Multimed Tools Appl. https://doi.org/10.1007/s11042-017-4882-8
Article Google Scholar
Zhang Y, Ying MTC, Yang L et al (2016) Coarse-to-fine stacked fully convolutional nets for lymph node segmentation in ultrasound images. In: 2016 IEEE Int. Conf. Bioinforma. Biomed. IEEE, pp 443–448. https://doi.org/10.1109/BIBM.2016.7822557
Zhang H, Xu T, Li H (2017) StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: 2017 IEEE Int. Conf. Comput. Vis. IEEE, pp 5908–5916. https://doi.org/10.1109/ICCV.2017.629
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: 2017 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 6230–6239. https://doi.org/10.1109/CVPR.2017.660
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. In: 2015 IEEE Int. Conf. Comput. Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 1529–1537. https://doi.org/10.1109/ICCV.2015.179
Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ADE20K dataset. In: 2017 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 5122–5130. https://doi.org/10.1109/CVPR.2017.544

Download references

Acknowledgements

This work was supported by Development Program of China (Nos. 2018YFC0309100 and 2018YFC0309104), National Natural Science Foundation of China (Nos. 61321491 and 61272219), National High Technology Research and Development Program of China (No. 2007AA01Z334), National Key Research and, Program for New Century Excellent Talents in University of China (NCET-04-04605), the China Postdoctoral Science Foundation (Grant No. 2017M621700) and Innovation Fund of State Key Laboratory for Novel Software Technology (Nos. ZZKT2013A12, ZZKT2016A11 and ZZKT2018A09).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, 212003, China
Yunhan Sun & Jinlong Shi
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210046, China
Jiagao Hu & Zhengxing Sun

Authors

Yunhan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jiagao Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jinlong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zhengxing Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jinlong Shi or Zhengxing Sun.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, Y., Hu, J., Shi, J. et al. Progressive decomposition: a method of coarse-to-fine image parsing using stacked networks. Multimed Tools Appl 79, 13379–13402 (2020). https://doi.org/10.1007/s11042-019-08288-4

Download citation

Received: 02 November 2018
Revised: 06 June 2019
Accepted: 30 September 2019
Published: 29 January 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11042-019-08288-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Progressive decomposition: a method of coarse-to-fine image parsing using stacked networks

Abstract

Access this article

Similar content being viewed by others

Learning deep representations for semantic image parsing: a comprehensive overview

Contrastive and Consistent Learning for Unsupervised Human Parsing

Improved Efficiency of Semantic Segmentation using Pyramid Scene Parsing Deep Learning Network Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Progressive decomposition: a method of coarse-to-fine image parsing using stacked networks

Abstract

Access this article

Similar content being viewed by others

Learning deep representations for semantic image parsing: a comprehensive overview

Contrastive and Consistent Learning for Unsupervised Human Parsing

Improved Efficiency of Semantic Segmentation using Pyramid Scene Parsing Deep Learning Network Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation