Skip to main content

Meta-learning of Pooling Layers for Character Recognition

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Abstract

In convolutional neural network-based character recognition, pooling layers play an important role in dimensionality reduction and deformation compensation. However, their kernel shapes and pooling operations are empirically predetermined; typically, a fixed-size square kernel shape and max pooling operation are used. In this paper, we propose a meta-learning framework for pooling layers. As part of our framework, a parameterized pooling layer is proposed in which the kernel shape and pooling operation are trainable using two parameters, thereby allowing flexible pooling of the input data. We also propose a meta-learning algorithm for the parameterized pooling layer, which allows us to acquire a suitable pooling layer across multiple tasks. In the experiment, we applied the proposed meta-learning framework to character recognition tasks. The results demonstrate that a pooling layer that is suitable across character recognition tasks was obtained via meta-learning, and the obtained pooling layer improved the performance of the model in both few-shot character recognition and noisy image recognition tasks.

We provide our implementation at https://github.com/Otsuzuki/Meta-learning-of-Pooling-Layers-for-Character-Recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baik, S., Hong, S., Lee, K.M.: Learning to forget for meta-learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2379–2387 (2020)

    Google Scholar 

  2. Berman, M., Jégou, H., Vedaldi, A., Kokkinos, I., Douze, M.: Multigrain: a unified image embedding for classes and instances. arXiv preprint arXiv:1902.05509 (2019)

  3. Chen, J., Zhan, L.M., Wu, X.M., Chung, F.l.: Variational metric scaling for metric-based meta-learning. In: AAAI Conference on Artificial Intelligence, vol. 34, pp. 3478–3485 (2020)

    Google Scholar 

  4. Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., Belongie, S.: Kernel pooling for convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2930 (2017)

    Google Scholar 

  5. Dollar, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: British Machine Vision Conference (2009)

    Google Scholar 

  6. Elsken, T., Staffler, B., Metzen, J.H., Hutter, F.: Meta-learning of neural architectures for few-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 12365–12375 (2020)

    Google Scholar 

  7. Feng, J., Ni, B., Tian, Q., Yan, S.: Geometric \(l_p\)-norm feature pooling for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2609–2704 (2011)

    Google Scholar 

  8. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning, pp. 1126–1135 (2017)

    Google Scholar 

  9. Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326 (2016)

    Google Scholar 

  10. Gao, Z., Xie, J., Wang, Q., Li, P.: Global second-order pooling convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3024–3033 (2019)

    Google Scholar 

  11. Gao, Z., Wang, L., Wu, G.: LIP: local importance-based pooling. In: International Conference on Computer Vision, pp. 3355–3364 (2019)

    Google Scholar 

  12. Graham, B.: Fractional max-pooling. arXiv preprint arXiv:1412.6071 (2014)

  13. Hou, Q., Zhang, L., Cheng, M.M., Feng, J.: Strip pooling: rethinking spatial pooling for scene parsing. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4003–4012 (2020)

    Google Scholar 

  14. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)

    Google Scholar 

  15. Khodadadeh, S., Bölöni, L., Shah, M.: Unsupervised meta-learning for few-shot image classification. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)

    Google Scholar 

  17. Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: The Omniglot challenge: a 3-year progress report. Curr. Opin. Behav. Sci. 29, 97–104 (2019)

    Article  Google Scholar 

  18. Li, P., Xie, J., Wang, Q., Gao, Z.: Towards faster training of global covariance pooling networks by iterative matrix square root normalization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 947–955 (2018)

    Google Scholar 

  19. Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)

    Google Scholar 

  20. Malinowski, M., Fritz, M.: Learning smooth pooling regions for visual recognition. In: British Machine Vision Conference (2013)

    Google Scholar 

  21. Munkhdalai, T., Yu, H.: Meta networks. In: International Conference on Machine Learning, pp. 2554–2563 (2017)

    Google Scholar 

  22. NguyenVan, D., Lu, S., Tian, S., Ouarti, N., Mokhtari, M.: A pooling based scene text proposal technique for scene text reading in the wild. Pattern Recogn. 87, 118–129 (2019)

    Article  Google Scholar 

  23. Otsuzuki, T., Hayashi, H., Zheng, Y., Uchida, S.: Regularized pooling. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12397, pp. 241–254. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61616-8_20

    Chapter  Google Scholar 

  24. Rusu, A.A., et al.: Meta-learning with latent embedding optimization. In: International Conference on Learning Representations (2019)

    Google Scholar 

  25. Saeedan, F., Weber, N., Goesele, M., Roth, S.: Detail-preserving pooling in deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 9108–9116 (2018)

    Google Scholar 

  26. Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning, vol. 48, pp. 1842–1850 (2016)

    Google Scholar 

  27. Sermanet, P., Chintala, S., LeCun, Y.: Convolutional neural networks applied to house numbers digit classification. In: International Conference on Pattern Recognition, pp. 3288–3291 (2012)

    Google Scholar 

  28. Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018)

    Google Scholar 

  29. Tolias, G., Sicre, R., Jégou, H.: Particular object retrieval with integral max-pooling of CNN activations. In: International Conference on Learning Representations (2016)

    Google Scholar 

  30. Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, vol. 29, pp. 3630–3638 (2016)

    Google Scholar 

  31. Wang, H., Wang, Q., Gao, M., Li, P., Zuo, W.: Multi-scale location-aware kernel representation for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1248–1257 (2018)

    Google Scholar 

  32. Wei, Z., et al.: Building detail-sensitive semantic segmentation networks with polynomial pooling. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7115–7123 (2019)

    Google Scholar 

  33. Yu, D., Wang, H., Chen, P., Wei, Z.: Mixed pooling for convolutional neural networks. In: Miao, D., Pedrycz, W., Ślȩzak, D., Peters, G., Hu, Q., Wang, R. (eds.) RSKT 2014. LNCS (LNAI), vol. 8818, pp. 364–375. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11740-9_34

    Chapter  Google Scholar 

  34. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

    Google Scholar 

  35. Zhou, A., Knowles, T., Finn, C.: Meta-learning symmetries by reparameterization. In: International Conference on Learning Representations (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hideaki Hayashi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Otsuzuki, T., Song, H., Uchida, S., Hayashi, H. (2021). Meta-learning of Pooling Layers for Character Recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86334-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86333-3

  • Online ISBN: 978-3-030-86334-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics