Skip to main content
Log in

Progressive decomposition: a method of coarse-to-fine image parsing using stacked networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

To parse images into fine-grained semantic parts, the complex elements will put it in trouble when using off-the-shelf semantic segmentation networks, because it is difficult for them to utilize the contextual information of fine-grained parts. In this paper we propose a progressive decomposition method to parse images in a coarse-to-fine manner with refined semantic classes. It consists of two aspects: stacked networks and progressive supervisions. The stacked network is achieved by stacking some segmentation layers in a segmentation network. The former segmentation module parses images at a coarser-grained level, and the result will be fed to the following one to provide effective contextual clues for the finer-grained parsing. The skip connections from shallow layers of the network to fine-grained parsing modules are also added to recover the details of small structures. For the training of the stacked networks which have coarse-to-fine outputs, a strategy of progressive supervision is proposed to merge classes in ground truth to get coarse-to-fine label maps, and then train the stacked network end-to-end with the hierarchical supervisions. The proposed framework can be injected into many advanced neural networks to improve the parsing results. Extensive evaluations on several public datasets including face parsing and human parsing well demonstrate the superiority of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 39:2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615

    Article  Google Scholar 

  2. Chen L-C, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: 2016 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3640–3649. https://doi.org/10.1109/CVPR.2016.396

  3. Chen L-C, Papandreou G, Kokkinos I et al (2018) DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40:834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  4. Eigen D, Fergus R (2015) Predicting depth, surface Normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE Int. Conf. Comput. Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 2650–2658. https://doi.org/10.1109/ICCV.2015.304

  5. Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: 2017 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 4476–4484. https://doi.org/10.1109/CVPR.2017.476

  6. Fu J, Liu J, Wang Y, Lu H (2017) Densely connected deconvolutional network for semantic segmentation. In: 2017 IEEE Int. Conf. Image process. ICIP 2017, Beijing, China, Sept. 17–20, 2017. IEEE, pp 3085–3089. https://doi.org/10.1109/ICIP.2017.8296850

  7. Gong K, Liang X, Zhang D, Shen X, Lin L (2017) Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: 2017 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 6757–6765. https://doi.org/10.1109/CVPR.2017.715

  8. Hariharan B, Arbeláez PA, Girshick RB, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015. IEEE Computer Society, pp 447–456. https://doi.org/10.1109/CVPR.2015.7298642

  9. Hu J, Sun Z, Sun Y, Shi J (2018) Progressive refinement: a method of coarse-to-fine image parsing using stacked network. In: 2018 IEEE Int. Conf. Multimed. Expo, ICME 2018, San Diego, USA, July 23–27, 2018, pp 1–6

  10. Jegou S, Drozdzal M, Vazquez D et al (2017) The one hundred layers tiramisu: fully convolutional DenseNets for semantic segmentation. In: 2017 IEEE Conf. Comput. Vis. Pattern Recognit. Work. IEEE, pp 1175–1183. https://doi.org/10.1109/CVPRW.2017.156

  11. Krähenbühl P, Koltun V (2011) Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ (eds) Adv. Neural Inf. Process. Syst. 24, 25th Annu. Conf. Neural Inf. Process. Syst. 2011. Proc. a meet. Held 12–14 December 2011, Granada, Spain, pp 109–117 http://papers.nips.cc/paper/4296-efficient-inference-in-fully-connected-crfs-with-gaussian-edge-potentials

  12. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Bartlett PL, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ (eds) Adv. Neural Inf. Process. Syst. 25 26th Annu. Conf. Neural Inf. Process. Syst. 2012. Proc. a meet. Held December 3–6, 2012, Lake Tahoe, Nevada, United States, pp 1106–1114 http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks

  13. Li Z, Zhang J (2017) Pixel-level guided face editing with fully convolution networks. In: 2017 IEEE Int. Conf. Multimed. Expo. IEEE, pp 307–312. https://doi.org/10.1109/ICME.2017.8019363

  14. Liang X, Liu S, Shen X et al (2015) Deep human parsing with active template regression. IEEE Trans Pattern Anal Mach Intell 37:2402–2414. https://doi.org/10.1109/TPAMI.2015.2408360

    Article  Google Scholar 

  15. Liang X, Xu C, Shen X, Yang J, Liu S, Tang J, Lin L, Yan S (2015) Human parsing with contextualized convolutional neural network. In: 2015 IEEE Int. Conf. Comput. Vis. IEEE, pp 1386–1394. https://doi.org/10.1109/ICCV.2015.163

  16. Liang X, Lin L, Yang W et al (2016) Clothes co-parsing via joint image segmentation and labeling with application to clothing retrieval. IEEE Trans Multimed 18:1175–1186. https://doi.org/10.1109/TMM.2016.2542983

    Article  Google Scholar 

  17. Liang X, Shen X, Xiang D, Feng J, Lin L, Yan S (2016) Semantic object parsing with local-global Long short-term memory. In: 2016 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3185–3193. https://doi.org/10.1109/CVPR.2016.347

  18. Lin G, Shen C, van den Hengel A, Reid ID (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: 2016 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016. IEEE Computer Society, pp 3194–3203. https://doi.org/10.1109/CVPR.2016.348

  19. Lin G, Milan A, Shen C, Reid ID (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: 2017 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 5168–5177. https://doi.org/10.1109/CVPR.2017.549

  20. Liu S, Liang X, Liu L, Shen X, Yang J, Xu C, Lin L, Xiaochun C, Yan S (2015) Matching-CNN meets KNN: quasi-parametric human parsing. In: 2015 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 1419–1427. https://doi.org/10.1109/CVPR.2015.7298748

  21. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965

  22. Luo P, Wang X, Tang X (2012) Hierarchical face parsing via deep learning. In: 2012 IEEE Conf. Comput. Vis. Pattern recognition, Provid. RI, USA, June 16–21, 2012. IEEE Computer Society, pp 2480–2487. https://doi.org/10.1109/CVPR.2012.6247963

  23. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: 2015 IEEE Int. Conf. Comput. Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 1520–1528. https://doi.org/10.1109/ICCV.2015.178

  24. Porway J, Wang Q, Zhu SC (2010) A hierarchical and contextual model for aerial image parsing. Int J Comput Vis 88:254–283. https://doi.org/10.1007/s11263-009-0306-1

    Article  MathSciNet  Google Scholar 

  25. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (Eds.) Med. Image Comput. Comput. Interv. - MICCAI 2015 - 18th Int. Conf. Munich, Ger. Oct. 5–9, 2015, proceedings, part III. Springer, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28

    Google Scholar 

  26. Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39:640–651. https://doi.org/10.1109/TPAMI.2016.2572683

    Article  Google Scholar 

  27. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, CoRR. abs/1409.1556. http://arxiv.org/abs/1409.1556

  28. Smith BM, Zhang L, Brandt J, Lin Z, Yang J (2013) Exemplar-based face parsing. In: 2013 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 3484–3491. https://doi.org/10.1109/CVPR.2013.447

  29. Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015. IEEE Computer Society, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594

  30. Tu Z, Chen X, Yuille AL, Zhu SC (2005) Image parsing: unifying segmentation, detection, and recognition. Int J Comput Vis 63:113–140. https://doi.org/10.1007/s11263-005-6642-x

    Article  Google Scholar 

  31. Wang T, Borji A, Zhang L, Zhang P, Lu H (2017) A stagewise refinement model for detecting salient objects in images. In: 2017 IEEE Int. Conf. Comput. Vis. IEEE, pp 4039–4048. https://doi.org/10.1109/ICCV.2017.433

  32. Xu Z, Chen H, Zhu SC, Luo J (2008) A hierarchical compositional model for face representation and sketching. IEEE Trans Pattern Anal Mach Intell 30:955–969. https://doi.org/10.1109/TPAMI.2008.50

    Article  Google Scholar 

  33. Yang K, Sun Z (2017) Paint with stitches: a style definition and image-based rendering method for random-needle embroidery. Multimed Tools Appl. https://doi.org/10.1007/s11042-017-4882-8

    Article  Google Scholar 

  34. Zhang Y, Ying MTC, Yang L et al (2016) Coarse-to-fine stacked fully convolutional nets for lymph node segmentation in ultrasound images. In: 2016 IEEE Int. Conf. Bioinforma. Biomed. IEEE, pp 443–448. https://doi.org/10.1109/BIBM.2016.7822557

  35. Zhang H, Xu T, Li H (2017) StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: 2017 IEEE Int. Conf. Comput. Vis. IEEE, pp 5908–5916. https://doi.org/10.1109/ICCV.2017.629

  36. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: 2017 IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp 6230–6239. https://doi.org/10.1109/CVPR.2017.660

  37. Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. In: 2015 IEEE Int. Conf. Comput. Vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 1529–1537. https://doi.org/10.1109/ICCV.2015.179

  38. Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ADE20K dataset. In: 2017 IEEE Conf. Comput. Vis. Pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 5122–5130. https://doi.org/10.1109/CVPR.2017.544

Download references

Acknowledgements

This work was supported by Development Program of China (Nos. 2018YFC0309100 and 2018YFC0309104), National Natural Science Foundation of China (Nos. 61321491 and 61272219), National High Technology Research and Development Program of China (No. 2007AA01Z334), National Key Research and, Program for New Century Excellent Talents in University of China (NCET-04-04605), the China Postdoctoral Science Foundation (Grant No. 2017M621700) and Innovation Fund of State Key Laboratory for Novel Software Technology (Nos. ZZKT2013A12, ZZKT2016A11 and ZZKT2018A09).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jinlong Shi or Zhengxing Sun.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Hu, J., Shi, J. et al. Progressive decomposition: a method of coarse-to-fine image parsing using stacked networks. Multimed Tools Appl 79, 13379–13402 (2020). https://doi.org/10.1007/s11042-019-08288-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08288-4

Keywords

Navigation