Dilated Residual Aggregation Network for Text-Guided Image Manipulation

Lu, Siwei; Luo, Di; Yang, Zhenguo; Hao, Tianyong; Li, Qing; Liu, Wenyin

doi:10.1007/978-3-030-86365-4_3

Siwei Lu¹²,
Di Luo¹²,
Zhenguo Yang¹²,
Tianyong Hao¹³,
Qing Li¹⁴ &
…
Wenyin Liu^12,15

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12893))

Included in the following conference series:

International Conference on Artificial Neural Networks

2453 Accesses

Abstract

Text-guided image manipulation aims to modify the visual attributes of images according to textual descriptions. Existing works either mismatch between generated images and textual descriptions or may pollute the text-irrelevant image regions. In this paper, we propose a dilated residual aggregation network (denoted as DRA) for text-guided image manipulation, which exploits a long-distance residual with dilated convolutions (RD) to aggregate the encoded visual content and style features and the textual features of the guiding descriptions. In particular, the dilated convolutions increase the receptive field without sacrificing spatial resolutions of intermediate features, benefiting to reconstructing the texture details matching with the textual descriptions. Furthermore, we propose an attention-guided injection module (AIM) to inject textual semantics into feature maps of DRA without polluting the text-irrelevant image regions by combining triplet attention mechanism and central biasing instance normalization. Quantitative and qualitative experiments conducted on the CUB-200-2011 and Oxford-102 datasets demonstrate the superior performance of the proposed DRA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dong, H., Yu, S., Wu, C., Guo, Y.: Semantic image synthesis via adversarial learning. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Google Scholar
Nam, S., Kim, Y., Kim, S. J.: Text-adaptive generative adversarial networks: manipulating images with natural language. arXiv preprint arXiv:1810.11919(2018)
Goodfellow, I.J., et al.: Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)
Vo, D.M., Sugimoto, A.: Paired-D GAN for semantic image synthesis. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11364, pp. 468–484. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_29
Chapter Google Scholar
Yu, X., Chen, Y., Li, T., Liu, S., Li, G.: Multi-mapping image-to-image translation via learning disentanglement. arXiv preprint arXiv:1909.07877 (2019)
Yu, X., Ying, Z., Li, T., Liu, S., Li, G.: Multi-mapping image-to-image translation with central biasing normalization. arXiv preprint arXiv:1806.10050 (2018)
Zhang, Y., Tian, Y.: Residual dense network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Liu, J., Zhang, W., Tang, Y., Tang, J., Wu, G.: Residual feature aggregation network for image super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2021)
Google Scholar
Huang, X., Liu, M.Y., Belongie, S.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (2018)
Google Scholar
Anokhin, I., Solovev, P., Korzhenkov, D., Kharlamov, A., Khakhulin, T.: High-resolution daytime translation without domain labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision (2018)
Google Scholar
Lin, Q., Yan, B., Li, J., Tan, W.: MMFL: multimodal fusion learning for text-guided image inpainting. In: ACM MM (2020)
Google Scholar
Chen, S., Huang, K., Xiong, D., Li, B., Claesen, L.: Fine-grained channel pruning for deep residual neural networks. In: Farkaš, I., Masulli, P., Wermter, S. (eds.) ICANN 2020. LNCS, vol. 12397, pp. 3–14. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61616-8_1
Chapter Google Scholar
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning (2019)
Google Scholar
Li, B., Qi, X.: Manigan: text-guided image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Li, B., Qi, X.: Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation. arXiv preprint arXiv:2010.12136 (2020)

Download references

Acknowledgment

This work is supported by the National Natural Science Foundation of China (No. 62076073, No. 61902077), the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515010616), Science and Technology Program of Guangzhou (No. 202102020524, No. 202007040005), the Guangdong Innovative Research Team Program (No. 2014ZT05G157), the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province (LZC0023), and Hong Kong RGC CRF Project C1031-18G.

Author information

Authors and Affiliations

Guangdong University of Techonology, Guangzhou, China
Siwei Lu, Di Luo, Zhenguo Yang & Wenyin Liu
South China Normal University, Guangzhou, China
Tianyong Hao
The Hong Kong Polytechnic University, Hong Kong, China
Qing Li
Cyberspace Security Research Center, Peng Cheng Laboratory, Shenzhen, China
Wenyin Liu

Authors

Siwei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Di Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhenguo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tianyong Hao
View author publications
You can also search for this author in PubMed Google Scholar
Qing Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenyin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhenguo Yang or Wenyin Liu .

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, S., Luo, D., Yang, Z., Hao, T., Li, Q., Liu, W. (2021). Dilated Residual Aggregation Network for Text-Guided Image Manipulation. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12893. Springer, Cham. https://doi.org/10.1007/978-3-030-86365-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-86365-4_3
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86364-7
Online ISBN: 978-3-030-86365-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics