Skip to main content
Log in

Generative adversarial network based on semantic consistency for text-to-image generation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Although text-to-image generation technology has made significant progress in visually realistic images, the generated images cannot be completely consistent with the texts. In this paper, a novel generative adversarial network based on semantic consistency is proposed to generate semantically consistent and realistic images according to text descriptions. The proposed method explores the semantic consistency between text and image for an efficient cross-modal generation that combines image generation and semantic correlation. A generation network with a hybrid attention is utilized to generate different resolution images, which improves the authenticity of the generated images. In addition, a semantic comparison module is presented to map the texts and the generated images to the same semantic space for comparison through consistency refinement and information classification. Extensive experiments on public benchmark datasets demonstrate that the proposed method outperforms the comparative methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated and analysed during this study are available in the repository: http://cocodataset.org. and http://www.vision.caltech.edu/visipedia/CUB-200-2011.html. All other data are available from the authors upon reasonable request.

References

  1. Agnese J, Herrera J, Tao H, Zhu X (2020) A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdiscip Rev Data Min Knowl Discov 10(4)

  2. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville CC, Bengio Y (2014) Generative adversarial nets. NIPS 2014:2672–2680

  3. Yang Z, He X, Gao J, Deng L, Smola AJ (2016) Stacked attention networks for image question answering. CVPR, 21–29

  4. Reed SE, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. ICML, 1060–1069

  5. Li M (2022) Gai-Ge Wang:A review of green shop scheduling problem. Inf Sci 589:478–496

    Article  Google Scholar 

  6. Wang G, Lu M, Dong Y-Q, Zhao X-J (2016) Self-adaptive extreme learning machine. Neural Comput Appl 27(2):291–303

    Article  Google Scholar 

  7. Yi J-H, Wang J, Wang G-G (2016) Improved probabilistic neural networks with self-adaptive strategies for transformer fault diagnosis problem. Adv Mech Eng 8(1):1–13

    Article  Google Scholar 

  8. Cui Z, Xue F, Cai X, Cao Y, Wang G, Chen J (2018) Detection of malicious code variants based on deep learning. IEEE Trans Ind Inf 14(7):3187–3196

    Article  Google Scholar 

  9. Zhang H, Xu T, Li H, Zhang S, Huang X, Wang X, Metaxas DN (2016) StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. CoRR abs/1612.03242

  10. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Dimitris N (2019) Metaxas: StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962

    Article  Google Scholar 

  11. Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. CVPR :1316–1324

  12. Park H, Yoo Y, Kwak N (2018) MC-GAN: Multi-conditional generative adversarial network for image synthesis. BMVC :76

  13. Zhang Z, Xie Y, Yang L (2018) Photographic text-to-image synthesis with a hierarchically-nested adversarial network. CVPR :6199–6208

  14. Zhu M, Pan P, Chen W, Yang Y (2019) DM-GAN: Dynamic memory generative adversarial networks for text-to-image synthesis. CVPR :5802–5810

  15. Pande S, Chouhan S, Sonavane R, Walambe R, Ghinea G, Kotecha K (2021) Development and deployment of a generative model-based framework for text to photorealistic image generation. Neurocomputing 463:1–16

    Article  Google Scholar 

  16. Qiao T, Zhang J, Xu D, Tao D (2019) MirrorGAN: Learning text-to-image generation by redescription. CVPR :1505–1514

  17. Almahairi A, Rajeswar S, Sordoni A, Bachman P, Courville AC (2018) Augmented CycleGAN: Learning many-to-many mappings from unpaired data. ICML :195–204

  18. Li B, Qi X, Lukasiewicz T, Torr PHS (2019) Controllable text-to-image generation. NeurIPS 2019:2063–2073

  19. Zhang H, Koh JY, Baldridge J, Lee H, Yang Y (2021) Cross-modal contrastive learning for text-to-image generation. CVPR :833–842

  20. Xie D, Deng C, Li C, Liu X, Tao D (2020) Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans Image Process 29:3626–3637

  21. Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. ECCV :694–7112

  22. C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. TheCaltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.

  23. Lin T-Yi, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: Common objects in context. ECCV :740–755

  24. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

  25. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980, 2014.

  26. Salimans T, Goodfellow IJ, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training GANs. NIPS :2226–2234

  27. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. NIPS :6626–6637

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Y., Liu, L., Zhang, H. et al. Generative adversarial network based on semantic consistency for text-to-image generation. Appl Intell 53, 4703–4716 (2023). https://doi.org/10.1007/s10489-022-03660-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03660-8

Keywords

Navigation