Abstract
Realistic text-to-image synthesis has achieved great improvements in recent years. However, most work ignores the relationship between low and high resolution and prefers to adopt identical module in different stages. It is obviously inappropriate because the differences in various generation stages are huge. Therefore, we propose a novel structure of network named SS-GANs, in which specific modules are added in different stages to satisfy the unique requirements. In addition, we also explore an effective training way named coordinated train and a simple negative sample selection mechanism. Lastly, we train our model on Oxford-102 dataset, which outperforms the state-of-the-art models.
The first author Ming Tian is master candidate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)
Cha, M., Gwon, Y.L., Kung, H.T.: Adversarial learning of semantic relevance in text to image synthesis. In: The Thirty-Third Conference on Artificial Intelligence, pp. 3272–3279 (2019)
Dash, A., Gamboa, J.C.B., Ahmed, S., Liwicki, M., Afzal, M.Z.: TAC-GAN-text conditioned auxiliary classifier generative adversarial network. arXiv preprint arXiv:1703.06412 (2017)
Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1486–1494 (2015)
Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improving visual-semantic embeddings with hard negatives. In: British Machine Vision Conference, p. 12 (2018)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems, pp. 5767–5777 (2017)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: IEEE International Conference on Computer Vision, pp. 2813–2821 (2017)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729. IEEE (2008)
Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–58 (2016)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset. California Institute of Technology (2011)
Wang, C., Chang, X., Xin, Y., Tao, D.: Evolutionary generative adversarial networks. IEEE Trans. Evol. Comput. (99), 1 (2018)
Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Yuan, M., Peng, Y.: Text-to-image synthesis via symmetrical distillation networks. arXiv preprint arXiv:1808.06801 (2018)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)
Zhang, H., et al.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1710.10916 (2017)
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Acknowledgement
This work was supported in part by the National Natural Science Foundation of China under Grants 61571354 and 61671385. In part by China Post doctoral Science Foundation under Grant 158201.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tian, M., Xue, Y., Tian, C., Wang, L., Deng, D., Wei, W. (2019). SS-GANs: Text-to-Image via Stage by Stage Generative Adversarial Networks. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11858. Springer, Cham. https://doi.org/10.1007/978-3-030-31723-2_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-31723-2_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31722-5
Online ISBN: 978-3-030-31723-2
eBook Packages: Computer ScienceComputer Science (R0)