Skip to main content

Dilated Residual Aggregation Network for Text-Guided Image Manipulation

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2021 (ICANN 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12893))

Included in the following conference series:

  • 2453 Accesses

Abstract

Text-guided image manipulation aims to modify the visual attributes of images according to textual descriptions. Existing works either mismatch between generated images and textual descriptions or may pollute the text-irrelevant image regions. In this paper, we propose a dilated residual aggregation network (denoted as DRA) for text-guided image manipulation, which exploits a long-distance residual with dilated convolutions (RD) to aggregate the encoded visual content and style features and the textual features of the guiding descriptions. In particular, the dilated convolutions increase the receptive field without sacrificing spatial resolutions of intermediate features, benefiting to reconstructing the texture details matching with the textual descriptions. Furthermore, we propose an attention-guided injection module (AIM) to inject textual semantics into feature maps of DRA without polluting the text-irrelevant image regions by combining triplet attention mechanism and central biasing instance normalization. Quantitative and qualitative experiments conducted on the CUB-200-2011 and Oxford-102 datasets demonstrate the superior performance of the proposed DRA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dong, H., Yu, S., Wu, C., Guo, Y.: Semantic image synthesis via adversarial learning. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

    Google Scholar 

  2. Nam, S., Kim, Y., Kim, S. J.: Text-adaptive generative adversarial networks: manipulating images with natural language. arXiv preprint arXiv:1810.11919(2018)

  3. Goodfellow, I.J., et al.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)

  4. Vo, D.M., Sugimoto, A.: Paired-D GAN for semantic image synthesis. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11364, pp. 468–484. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_29

    Chapter  Google Scholar 

  5. Yu, X., Chen, Y., Li, T., Liu, S., Li, G.: Multi-mapping image-to-image translation via learning disentanglement. arXiv preprint arXiv:1909.07877 (2019)

  6. Yu, X., Ying, Z., Li, T., Liu, S., Li, G.: Multi-mapping image-to-image translation with central biasing normalization. arXiv preprint arXiv:1806.10050 (2018)

  7. Zhang, Y., Tian, Y.: Residual dense network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  8. Liu, J., Zhang, W., Tang, Y., Tang, J., Wu, G.: Residual feature aggregation network for image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  9. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  10. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  11. Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021)

    Google Scholar 

  12. Huang, X., Liu, M.Y., Belongie, S.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (2018)

    Google Scholar 

  13. Anokhin, I., Solovev, P., Korzhenkov, D., Kharlamov, A., Khakhulin, T.: High-resolution daytime translation without domain labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  14. Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision (2018)

    Google Scholar 

  15. Lin, Q., Yan, B., Li, J., Tan, W.: MMFL: multimodal fusion learning for text-guided image inpainting. In: ACM MM (2020)

    Google Scholar 

  16. Chen, S., Huang, K., Xiong, D., Li, B., Claesen, L.: Fine-grained channel pruning for deep residual neural networks. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12397, pp. 3–14. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61616-8_1

    Chapter  Google Scholar 

  17. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning (2019)

    Google Scholar 

  18. Li, B., Qi, X.: Manigan: text-guided image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)

    Google Scholar 

  19. Li, B., Qi, X.: Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation. arXiv preprint arXiv:2010.12136 (2020)

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China (No. 62076073, No. 61902077), the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515010616), Science and Technology Program of Guangzhou (No. 202102020524, No. 202007040005), the Guangdong Innovative Research Team Program (No. 2014ZT05G157), the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province (LZC0023), and Hong Kong RGC CRF Project C1031-18G.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhenguo Yang or Wenyin Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, S., Luo, D., Yang, Z., Hao, T., Li, Q., Liu, W. (2021). Dilated Residual Aggregation Network for Text-Guided Image Manipulation. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12893. Springer, Cham. https://doi.org/10.1007/978-3-030-86365-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86365-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86364-7

  • Online ISBN: 978-3-030-86365-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics