Skip to main content
Log in

Scale channel attention network for image segmentation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The object scale variation results in a negative effect on image segmentation performance. Spatial pyramid pooling module or the attention mechanism are two widely used components in deep neural networks to handle this problem. Applying the single component commonly achieves limited benefit. To push the limit, in this paper, we propose a scale channel attention network (SCA-Net), which enhances the fusion feature of multi-scale by using channel attention components. After the multiple-scale pooling step, the multi-scale spatial information distributes in different feature channels. Meanwhile, the channel attention block is employed to guide SCA-Net focus on the object-relevant scale channels. We further explore the channel attention block and find a simple yet effective structure to combine global average pooling and global maximum pooling, resulting in a robust global information encoder. The SCA-Net does not contain any time-consuming post-processing, which is an extra step after the neural network for the segmentation result optimization. The assessment results on PASCAL VOC 2012 and Cityscapes benchmarks achieve the test set performance of 75.5% and 77.0%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://www.tensorflow.org/

References

  1. Adelson EH, Anderson CH, Bergen JR, Burt PJ, Ogden JM (1984) Pyramid methods in image processing. RCA Eng 29(6):33–41

    Google Scholar 

  2. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation, vol 39, pp 2481–2495

  3. Bluche T (2016) Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Advances in neural information processing systems, pp 838–846

  4. Bulo SR, Neuhold G, Kontschieder P (2017) Loss max-pooling for semantic image segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 7082–7091

  5. Burt PJ (1988) Attention mechanisms for vision in a dynamic world. In: [1998 Proceedings] 9th international conference on pattern recognition. IEEE, pp 977–987

  6. Chen Liang-Chieh, Yi Y, Wang J, Wei X, Yuille AL (2016) Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640–3649

  7. Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667

  8. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 801–818

  9. Chen Liang-Chieh, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  10. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223

  11. Corbetta M, Shulman GL (2002) Control of goal-directed and stimulus-driven attention in the brain. Nature Rev Neurosci 3(3):201

    Article  Google Scholar 

  12. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  13. Hariharan B, Arbeláez P, Bourdev L, Maji S, Malik J (2011) Semantic contours from inverse detectors. In: 2011 International conference on computer vision. IEEE, pp 991–998

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Computer vision and pattern recognition, 770–778

  15. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  16. Huan D, Liu Z, Shi R (2018) Salient object segmentation based on depth-aware image layering. Multimed Tools Appl, 1–14

  17. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  18. Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2018) Ccnet: Criss-cross attention for semantic segmentation. arXiv:1811.11721

  19. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 11:1254–1259

    Article  Google Scholar 

  20. Jie H, Li S, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  21. Lin G, Shen C, Van Den Hengel A, Reid I (2016) Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3194–3203

  22. Lin T-Y, Dollár P., Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  23. Lin G, Milan A, Shen C, Reid I (2017) Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934

  24. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  25. Milletari F, Navab N, Ahmadi S-A (2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV). IEEE, pp 565–571

  26. Mnih V, Heess N, Heess AG, et al. (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212

  27. Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, pp 1520–1528

  28. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241

  29. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  30. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  31. Tian Y, Guo J, Yulei W, Lin H (2019) Towards attack and defense views of rational delegation of computation. IEEE Access

  32. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  33. Wang C, Yang J, Wang K, Lai S-H (2017) Multi-scale energy optimization for object proposal generation. Multimed Tools Appl 76(8):10481–10499

    Article  Google Scholar 

  34. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  35. Woo S, Park J, Lee J-Y, In SK (2018) Cbam Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19

  36. Xie H, Yang D, Sun N, Chen Z, Zhang Y (2019) Automated pulmonary nodule detection in ct images using deep convolutional neural networks. Pattern Recogn 85:109–119

    Article  Google Scholar 

  37. Xie H, Fang S, Zha Z-J, Yang Y, Li Y, Zhang Y (2019) Convolutional attention networks for scene text recognition. ACM Trans Multimed Comput Comm Appl (TOMM) 15(1s):3

    Google Scholar 

  38. Zhang Y, Li K, Li K, Wang L, Zhong B, Yun F (2018) Image super-resolution using very deep residual channel attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 286–301

  39. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890

  40. Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Zhizhong S, Dalong D, Huang C, Torr PHS (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537

  41. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929

Download references

Acknowledgements

This paper is partly supported by the National Key Research and Development Program of China (2017YFB0803301) and the Major Scientific and Technological Special Project of Guizhou Province (20183001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youliang Tian.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Tian, Y., Ma, W. et al. Scale channel attention network for image segmentation. Multimed Tools Appl 80, 16473–16489 (2021). https://doi.org/10.1007/s11042-020-08921-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08921-7

Keywords

Navigation