Skip to main content
Log in

Precise Correspondence Enhanced GAN for Person Image Generation

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

To generate a realistic person image for pose-guided person image generation, especially for local body parts, is challenging. Two reasons account for it: (1) the difficulty for long-range relation modeling, (2) a deficiency in precise local correspondence capturing. We propose a Precise Correspondence Enhanced Generative Adversarial Network (PCE-GAN) to address these problems. PCE-GAN includes a global branch and a local branch. The former maintains the global consistency of the generated person image and the latter captures the precise local correspondence. More specifically, the long-range relation is well established via the spatial-channel Multi-layer Perceptrons module in the transformation blocks within both branches. The precise local correspondence is captured effectively by the local branch’s local-pair building and local-guiding modules. Finally, the outputs of each branch are combined for mutually improved benefits based on the enhanced correspondences. Experimental results show that, compared to previous state-of-the-art methods using the Market-1501 dataset, PCE-GAN performs quantitatively better, with a \(5.53\%\) and \(7.74\%\) improvement in SSIM and IS scores, respectively. Qualitative results for both Market-1501 and DeepFashion datasets are also provided herein to further validate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ma L, Jia X, Sun Q, Schiele B, Tuytelaars T, Van Gool L (2017) Pose guided person image generation. In: Advances in neural information processing systems, pp 406–416

  2. Ma L, Sun Q, Georgoulis S, Van Gool L, Schiele B, Fritz M (2018) Disentangled person image generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 99– 108

  3. Siarohin A, Sangineto E, Lathuiliere S, Sebe N (2018) Deformable gans for pose-based human image generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3408– 3416

  4. Zhu Z, Huang T, Shi B, Yu M, Wang B, Bai X (2019) Progressive pose attention transfer for person image generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2347– 2356

  5. AlBahar B, Huang J-B (2019) Guided image-to-image translation with bi-directional feature transformation. In: Proceedings of the IEEE international conference on computer vision, pp 9016– 9025

  6. Men Y, Mao Y, Jiang Y, Ma W-Y, Lian, Z (2020) Controllable person image synthesis with attribute-decomposed gan. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5084– 5093

  7. Lv Z, Li X, Li X, Li F, Lin T, He D, Zuo W (2021) Learning semantic person image generation by region-adaptive normalization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10806– 10815

  8. Siarohin A, Woodford OJ, Ren J, Chai M, Tulyakov S (2021) Motion representations for articulated animation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 13653– 13662

  9. Tang H, Xu D, Liu G, Wang W, Sebe N, Yan Y (2019) Cycle in cycle generative adversarial networks for keypoint-guided image generation. In: Proceedings of the ACM international conference on multimedia, pp 2052– 2060

  10. Tang H. Bai S, Zhang L, Torr PH, Sebe N (2020) Xinggan for person image generation. In: Proceedings of the European conference on computer vision, pp 717– 734

  11. Tang H, Bai S, Torr PH, Sebe N (2020) Bipartite graph reasoning gans for person image generation. In: British machine vision conference

  12. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  13. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, vol 27

  14. Fister I Jr, Perc M, Ljubič K, Kamal SM, Iglesias A, Fister I (2015) Particle swarm optimization for automatic creation of complex graphic characters. Chaos Solit Fract 73:29–35

    Article  MathSciNet  Google Scholar 

  15. Kingma DP, Welling M (2013) Auto-encoding variational bayes. In: International conference on learning representations

  16. Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  17. Han Z, Huang H (2021) Gan based three-stage-training algorithm for multi-view facial expression recognition. Neural Process Lett 53(6):4189–4205

    Article  Google Scholar 

  18. Xiang X, Yu Z, Lv N, Kong X, Saddik AE (2020) Attention-based generative adversarial network for semi-supervised image classification. Neural Process Lett 51(2):1527–1540

    Article  Google Scholar 

  19. Wen J, Shen Y, Yang J (2022) Multi-view gait recognition based on generative adversarial network. Neural Process Lett 1–23

  20. Brock A, Donahue J, Simonyan K ( 2018) Large scale gan training for high fidelity natural image synthesis. In: International conference on learning representations

  21. Shaham TR, Dekel T Michaeli T (2019) Singan: learning a generative model from a single natural image. In: Proceedings of the IEEE international conference on computer vision, pp 4570– 4580

  22. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4401– 4410

  23. Esser P, Sutter E, Ommer B (2018) A variational u-net for conditional appearance and shape generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8857– 8866

  24. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  25. Zakharov E, Shysheya A, Burkov E, Lempitsky V (2019) Few-shot adversarial learning of realistic neural talking head models. In: Proceedings of the IEEE international conference on computer vision, pp 9459– 9468

  26. Kim J, Kim M, Kang H, Lee KH (2019) U-gat-it: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. In: International conference on learning representations

  27. Alami Mejjati Y, Richardt C, Tompkin J, Cosker D, Kim KI (2018) Unsupervised attention-guided image-to-image translation. Adv Neural Inf Process Syst 31:3693–3703

    Google Scholar 

  28. Park T, Liu M-Y, Wang T-C, Zhu J-Y (2019) Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2337– 2346

  29. Ren B, Tang H, Sebe N (2021) Cascaded cross mlp-mixer gans for cross-view image translation. In: British machine vision conference

  30. Balakrishnan G, Zhao A, Dalca AV, Durand F, Guttag J (2018) Synthesizing images of humans in unseen poses. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8340– 8348

  31. Lassner C, Pons-Moll G, Gehler PV (2017) A generative model of people in clothing. In: Proceedings of the IEEE international conference on computer vision, pp 853– 862

  32. Wang B, Zheng H, Liang X, Chen Y, Lin L, Yang M (2018) Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European conference on computer vision, pp 589– 604

  33. Neverova N, Alp Guler R, Kokkinos I (2018) Dense pose transfer. In: Proceedings of the European conference on computer vision, pp 123– 138

  34. Li Y, Huang C, Loy CC (2019) Dense intrinsic appearance flow for human pose transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3693– 3702

  35. Zanfir M, Oneata E, Popa A-I, Zanfir A, Sminchisescu C (2020) Human synthesis and scene compositing. Proc AAAI Conf Art Intell 34:12749–12756

    Google Scholar 

  36. Zhang J, Li K, Lai Y-K, Yang J (2021) Pise: person image synthesis and editing with decoupled gan. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7982–7990

  37. Cao Z, Simon T, Wei S-E, Sheikh, Y (2017) Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291– 7299

  38. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision

  39. Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1096– 1104

  40. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, vol 29

  41. Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International conference on learning representations

  42. Huang S, Xiong H, Cheng Z-Q, Wang Q, Zhou X, Wen B, Huan J, Dou D (2020) Generating person images with appearance-aware pose stylizer. In: International joint conference on artificial intelligence

  43. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems, vol 32

  44. Ren Y, Yu X, Chen J, Li TH, Li G (2020) Deep image spatial transformation for person image generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7690– 7699

  45. Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) Human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE computer vision and pattern recognition

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuesheng Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Details of the Evaluation Metrics

SSIM and Mask-SSIM Structural Similarity Index Measure (SSIM) [24] measures the similarity between the generated images and real images based at the pixel level on their luminance, contrast and structural aspects. A larger SSIM value indicates a greater similarity.

The SSIM is calculated on various windows of an image. The value between two windows x and y of size \(N \times N\) can be formulized as follows:

$$\begin{aligned} \begin{aligned} {\text {SSIM}}(x, y)=\frac{\left( 2 \mu _{x} \mu _{y}+c_{1}\right) \left( 2 \sigma _{x y}+c_{2}\right) }{\left( \mu _{x}^{2}+\mu _{y}^{2}+c_{1}\right) \left( \sigma _{x}^{2}+\sigma _{y}^{2}+c_{2}\right) } \end{aligned} \end{aligned}$$
(12)

where \(\mu _{x}\), \(\mu _{y}\), \(\sigma _{x}^{2}\), and \(\sigma _{y}^{2}\) mean the average of x, the average of y, the variance of x, and the variance of y. \(\sigma _{xy}\) indicates the variance of x and y. \(c_1\) and \(c_2\) are two variables to stabilize the division with weak denominator.

Note that the Mask-SSIM is the mask version of SSIM, the only difference is that the Mask-SSIM is applied to the images which with background removed.

IS and Mask-IS The Inception score (IS) is a popular metric for judging the outputs of GAN [40]. The score simultaneously measures: (a) The variety of the generated image, and (b) The diversity of the generated image. A higher score is better. It means your GAN can generate many different distinct images.

$$\begin{aligned} \begin{aligned} {\text {I S}}(G)=\exp \left( {\mathbb {E}}_{{\mathbf {x}} \sim p_{g}} D_{K L}\left( p(y \mid x)|| p(y)\right) \right) \end{aligned} \end{aligned}$$
(13)

where p(y) means the marginal distribution of the generated images. \(p(y \mid x)\) means a distribution in terms of the category of input x. \(D_{K L}\) means the Kullback-Leibler Divergence.

Note that the Mask-IS is the mask version of IS, it is also calculated as IS, the only difference is that the Mask-IS is applied to the images which with background removed.

PCKh The PCKh is used to quantify the shape consistency of the generated images. To be specific, person shape is simply represented by 18 pose joints obtained from the human pose estimator [45]. Then the shape consistency is approximated by pose joints alignment which is evaluated from PCKh measure. According to the protocol of A, PCKh score is the percentage of the keypoints pairs whose offsets are below the half size of the head segment.

$$\begin{aligned} \begin{aligned} P C K_{i}^{k}=\frac{\sum _{p} \delta \left( \frac{d_{p i}}{d_{p}^{d e f}} \le T_{k}\right) }{\sum _{p} 1} \end{aligned} \end{aligned}$$
(14)

where i means the i-th keypoint, k means the index of the threshold \(T_k\), p means the given person. \(d_{pi}\) indicates the Euclidean distance between the i-th keypoint of the person p and the groundtruth. \(d_{p}^{d e f}\) indicates the half size of the head segment mentioned above.

Appendix B Code of Spatial-Channel MLP

In order to promote the usage of our proposed method, the PyTorch-style code of the plug-and-play Spatial-Channel MLP module is provided as follows:

figure a

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Zhu, Y. Precise Correspondence Enhanced GAN for Person Image Generation. Neural Process Lett 54, 5125–5142 (2022). https://doi.org/10.1007/s11063-022-10853-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-10853-2

Keywords

Navigation