FSNet: An Identity-Aware Generative Model for Image-Based Face Swapping

Natsume, Ryota; Yatagawa, Tatsuya; Morishima, Shigeo

doi:10.1007/978-3-030-20876-9_8

Ryota Natsume¹⁸,
Tatsuya Yatagawa¹⁸ &
Shigeo Morishima^18,19

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11366))

Included in the following conference series:

Asian Conference on Computer Vision

2187 Accesses
19 Citations
14 Altmetric

Abstract

This paper presents FSNet, a deep generative model for image-based face swapping. Traditionally, face-swapping methods are based on three-dimensional morphable models (3DMMs), and facial textures are replaced between the estimated three-dimensional (3D) geometries in two images of different individuals. However, the estimation of 3D geometries along with different lighting conditions using 3DMMs is still a difficult task. We herein represent the face region with a latent variable that is assigned with the proposed deep neural network (DNN) instead of facial textures. The proposed DNN synthesizes a face-swapped image using the latent variable of the face region and another image of the non-face region. The proposed method is not required to fit to the 3DMM; additionally, it performs face swapping only by feeding two face images to the proposed network. Consequently, our DNN-based face swapping performs better than previous approaches for challenging inputs with different face orientations and lighting conditions. Through several experiments, we demonstrated that the proposed method performs face swapping in a more stable manner than the state-of-the-art method, and that its results are compatible with the method thereof.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194 (1999)
Google Scholar
Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: FaceWarehouse: a 3D facial expression database for visual computing. IEEE Trans. Vis. Comput. Graph. 20, 413–425 (2014)
Article Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: IEEE International Conference on Computer Vision (ICCV), pp. 3730–3738 (2015)
Google Scholar
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23, 1499–1503 (2016)
Article Google Scholar
Blanz, V., Scherbaum, K., Vetter, T., Seidel, H.P.: Exchanging faces in images. Comput. Graph. Forum 23, 669–676 (2004)
Article Google Scholar
Bitouk, D., Kumar, N., Dhillon, S., Belhumeur, P., Nayar, S.K.: Face swapping: automatically replacing faces in photographs. ACM Trans. Graph. (TOG) 27, 39:1–39:8 (2008)
Article Google Scholar
Yang, F., Wang, J., Shechtman, E., Bourdev, L., Metaxas, D.: Expression flow for 3D-aware face component transfer. ACM Trans. Graph. 30, 60:1–60:10 (2011)
Google Scholar
Chai, M., Wang, L., Weng, Y., Yu, Y., Guo, B., Zhou, K.: Single-view hair modeling for portrait manipulation. ACM Trans. Graph. 31, 116:1–116:8 (2012)
Article Google Scholar
Kemelmacher-Shlizerman, I.: Transfiguring portraits. ACM Trans. Graph. 35, 94:1–94:8 (2016)
Article Google Scholar
Mosaddegh, S., Simon, L., Jurie, F.: Photorealistic face de-identification by aggregating donors’ face components. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 159–174. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16811-1_11
Chapter Google Scholar
Korshunova, I., Shi, W., Dambre, J., Theis, L.: Fast face-swap using convolutional neural networks. arXiv preprint arXiv:1611.09577 (2016)
Hassner, T.: Viewing real-world faces in 3D. In: IEEE International Conference on Computer Vision (ICCV), pp. 3607–3614 (2013)
Google Scholar
McLaughlin, N., Martinez-del Rincon, J., Miller, P.: Data-augmentation for reducing dataset bias in person re-identification. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1–6 (2015)
Google Scholar
Masi, I., Trãn, A.T., Hassner, T., Leksut, J.T., Medioni, G.: Do we really need to collect millions of faces for effective face recognition? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 579–596. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_35
Chapter Google Scholar
Nirkin, Y., Masi, I., Tran, A.T., Hassner, T., Medioni, G.: On face segmentation, face swapping, and face perception. In: IEEE Conference on Automatic Face and Gesture Recognition (2018)
Google Scholar
Bao, J., Chen, D., Wen, F., Li, H., Hua, G.: CVAE-GAN: fine-grained image generation through asymmetric training. In: IEEE International Conference on Computer Vision (ICCV), pp. 2745–2754 (2017)
Google Scholar
FakeApp (2018). https://www.fakeapp.org/
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Natsume, R., Yatagawa, T., Morishima, S.: RSGAN: face swapping and editing using face and hair representation in latent spaces. arXiv prepring arXiv:1804.03447 (2018)
Bao, J., Chen, D., Wen, F., Li, H., Hua, G.: Towards open-set identity preserving face synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. 36, 107:1–107:14 (2017)
Article Google Scholar
Chen, Z., Nie, S., Wu, T., Healey, C.G.: High resolution face completion with multiple controllable attributes via fully end-to-end progressive generative adversarial networks. arXiv preprint arXiv:1801.07632 (2018)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Larsen, A.B.L., Kaae Sønderby, S., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015)
Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1335–1344 (2016)
Google Scholar
Rosca, M., Lakshminarayanan, B., Warde-Farley, D., Mohamed, S.: Variational approaches for auto-encoding generative adversarial networks. arXiv preprint arXiv:1706.04987 (2017)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. arXiv preprint arXiv:1612.01105 (2016)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, pp. 1398–1402 (2003)
Google Scholar
Amos, B., Ludwiczuk, B., Satyanarayanan, M.: OpenFace: a general-purpose face recognition library with mobile applications. Technical report, CMU School of Computer Science (2016)
Google Scholar
Salimans, T., et al.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems (NIPS), no. 29, pp. 2234–2242 (2016)
Google Scholar
Saito, S., Li, T., Li, H.: Real-time facial segmentation and performance capture from RGB input. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 244–261. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_15
Chapter Google Scholar

Download references

Acknowledgments

This study was granted in part by the Strategic Basic Research Program ACCEL of the Japan Science and Technology Agency (JPMJAC1602). Tatsuya Yatagawa was supported by the Research Fellowship for Young Researchers of Japan’s Society for the Promotion of Science (16J02280). Shigeo Morishima was supported by a Grant-in-Aid from Waseda Institute of Advanced Science and Engineering. The authors would also like to acknowledge NVIDIA Corporation for providing their GPUs in the academic GPU Grant Program.

Author information

Authors and Affiliations

Waseda University, 3-4-1, Ohkubo, Shinjuku, Tokyo, 169-8555, Japan
Ryota Natsume, Tatsuya Yatagawa & Shigeo Morishima
Waseda Research Institute of Science and Engineering, Tokyo, Japan
Shigeo Morishima

Authors

Ryota Natsume
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuya Yatagawa
View author publications
You can also search for this author in PubMed Google Scholar
Shigeo Morishima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ryota Natsume .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C.V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5646 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Natsume, R., Yatagawa, T., Morishima, S. (2019). FSNet: An Identity-Aware Generative Model for Image-Based Face Swapping. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11366. Springer, Cham. https://doi.org/10.1007/978-3-030-20876-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-20876-9_8
Published: 26 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20875-2
Online ISBN: 978-3-030-20876-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics