SPEM: Self-adaptive Pooling Enhanced Attention Module for Image Recognition

Zhong, Shanshan; Wen, Wushao; Qin, Jinghui

doi:10.1007/978-3-031-27818-1_4

Shanshan Zhong¹⁵,
Wushao Wen¹⁵ &
Jinghui Qin¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13834))

Included in the following conference series:

International Conference on Multimedia Modeling

1250 Accesses

Abstract

Recently, many effective attention modules are proposed to boot the model performance by exploiting the internal information of convolutional neural networks in computer vision. In general, many previous works overlook the design of the pooling strategy of the attention mechanism since they adopt the global average pooling for granted, which hinders the further improvement of the performance of the attention mechanism. However, we empirically find and verify a phenomenon that the simple linear combination of global max-pooling and global min-pooling can produce effective pooling strategies that match or exceed the performance of global average pooling. Based on this empirical observation, we propose a simple-yet-effective attention module SPEM which adopts a self-adaptive pooling strategy based on global max-pooling and global min-pooling and a lightweight module for producing the attention map. The effectiveness of SPEM is demonstrated by extensive experiments on widely-used benchmark datasets and popular attention networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014)
Canbek, G.: Gaining insights in datasets in the shade of “garbage in, garbage out’’ rationale: feature space distribution fitting. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 12(3), e1456 (2022)
Article Google Scholar
Fu, J., et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Google Scholar
Geiger, R.S., et al.: “Garbage in, garbage out’’ revisited: what do machine learning application papers report about human-labeled training data? Quant. Sci. Stud. 2(3), 795–827 (2021)
Article Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: DRAW: a recurrent neural network for image generation. In: International Conference on Machine Learning, pp. 1462–1471. PMLR (2015)
Google Scholar
Guo, M.H., et al.: Attention mechanisms in computer vision: a survey. Comput. Vis. Media 8, 331–368 (2022). https://doi.org/10.1007/s41095-022-0271-y
Article Google Scholar
He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2010)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, W., Huang, Z., Liang, M., Liang, S., Yang, H.: Blending pruning criteria for convolutional neural networks. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds.) ICANN 2021. LNCS, vol. 12894, pp. 3–15. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86380-7_1
Chapter Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Huang, Z., Liang, S., Liang, M., He, W., Yang, H.: Efficient attention network: accelerate attention by searching where to plug. arXiv preprint arXiv:2011.14058 (2020)
Huang, Z., Liang, S., Liang, M., Yang, H.: DIANet: dense-and-implicit attention network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4206–4214 (2020)
Google Scholar
Huang, Z., Shao, W., Wang, X., Lin, L., Luo, P.: Convolution-weight-distribution assumption: rethinking the criteria of channel pruning. arXiv preprint arXiv:2004.11627 (2020)
Huang, Z., Shao, W., Wang, X., Lin, L., Luo, P.: Rethinking the pruning criteria for convolutional neural network. In: Advances in Neural Information Processing Systems, vol. 34, pp. 16305–16318 (2021)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Lee, H., Kim, H.E., Nam, H.: SRM: a style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1854–1862 (2019)
Google Scholar
Li, H., et al.: Real-world image super-resolution by exclusionary dual-learning. IEEE Trans. Multimed. (2022)
Google Scholar
Li, X., Hu, X., Yang, J.: Spatial group-wise enhance: improving semantic feature learning in convolutional networks. arXiv preprint arXiv:1905.09646 (2019)
Liang, S., Huang, Z., Liang, M., Yang, H.: Instance enhancement batch normalization: an adaptive regulator of batch noise. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4819–4827 (2020)
Google Scholar
Luo, M., Wen, G., Hu, Y., Dai, D., Xu, Y.: Stochastic region pooling: make attention more expressive. Neurocomputing 409, 119–130 (2020)
Article Google Scholar
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Qin, J., Huang, Y., Wen, W.: Multi-scale feature fusion residual network for single image super-resolution. Neurocomputing 379, 334–342 (2020)
Article Google Scholar
Qin, J., Xie, Z., Shi, Y., Wen, W.: Difficulty-aware image super resolution via deep adaptive dual-network. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 586–591. IEEE (2019)
Google Scholar
Qin, J., Zhang, R.: Lightweight single image super-resolution with attentive residual refinement network. Neurocomputing 500, 846–855 (2022)
Article Google Scholar
Qin, Z., Zhang, P., Wu, F., Li, X.: FcaNet: frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 783–792 (2021)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Smith, A.J.: The need for measured data in computer system performance analysis or garbage in, garbage out. In: Proceedings Eighteenth Annual International Computer Software and Applications Conference (COMPSAC 1994), pp. 426–431. IEEE (1994)
Google Scholar
Wang, F., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
Google Scholar
Wang, Q., Wu, B., Zhu, P., Li, P., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Wang, Q., Wu, T., Zheng, H., Guo, G.: Hierarchical pyramid diverse attention networks for face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8326–8335 (2020)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Computer Science, pp. 2048–2057 (2015)
Google Scholar
Yang, Z., Zhu, L., Wu, Y., Yang, Y.: Gated channel transformation for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11794–11803 (2020)
Google Scholar

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (NSFC) under Grant No. 62206314 and Grant No. U1711264, GuangDong Basic and Applied Basic Research Foundation under Grant No. 2022A1515011835.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Shanshan Zhong, Wushao Wen & Jinghui Qin

Authors

Shanshan Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Wushao Wen
View author publications
You can also search for this author in PubMed Google Scholar
Jinghui Qin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wushao Wen .

Editor information

Editors and Affiliations

University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
Dublin City University, Dublin, Ireland
Cathal Gurrin
Radboud University Nijmegen, Nijmegen, The Netherlands
Martha Larson
Dublin City University, Dublin, Ireland
Alan F. Smeaton
University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
National Institute of Information and Communications Technology, Tokyo, Japan
Minh-Son Dao
Department of Information Science and Media Studies, University of Bergen, Bergen, Norway
Christoph Trattner
La Trobe University, Melbourne, VIC, Australia
Phoebe Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhong, S., Wen, W., Qin, J. (2023). SPEM: Self-adaptive Pooling Enhanced Attention Module for Image Recognition. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13834. Springer, Cham. https://doi.org/10.1007/978-3-031-27818-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-27818-1_4
Published: 31 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27817-4
Online ISBN: 978-3-031-27818-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SPEM: Self-adaptive Pooling Enhanced Attention Module for Image Recognition