A Method for Scene Text Style Transfer

Zhou, Gaojing; Wang, Lei; Liu, Xi; Zhou, Yongsheng; Zhang, Rui; Wei, Xiaolin

doi:10.1007/978-3-030-57058-3_39

Gaojing Zhou¹¹,
Lei Wang¹¹,
Xi Liu¹¹,
Yongsheng Zhou¹¹,
Rui Zhang¹¹ &
…
Xiaolin Wei¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12116))

Included in the following conference series:

International Workshop on Document Analysis Systems

1303 Accesses

Abstract

Text style transfer is a challenging problem in optical character recognition. Recent advances mainly focus on adopting the desired text style to guide the model to synthesize text images and the scene is always ignored. However, in natural scenes, the scene and text are a whole. There are two key challenges in scene text image translation: i) transfer text and scene into different styles, ii) keep the scene and text consistency. To address these problems, we propose a novel end-to-end scene text style transfer framework that simultaneously translates the text instance and scene background with different styles. We introduce an attention style encoder to extract the style codes for text instances and scene and we perform style transfer training on the cropped text area and scene separately to ensure the generated images are harmonious. We evaluate our method on the ICDAR2015 and MSRA-TD500 scene text datasets. The experimental results demonstrate that the synthetic images generated by our model can benefit the scene text detection task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Shen, Z., et al.: Towards instance-level image-to-image translation. arXiv preprint arXiv:1905.01744 (2019)
Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189 (2016)
Google Scholar
Cheung, B., et al.: Discovering hidden factors of variation in deep networks. arXiv preprint arXiv:1412.6583 (2014)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Yi, Z., Zhang, H., Tan, P., Gong, M.: Dualgan: unsupervised dual learning for image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2849–2857 (2017)
Google Scholar
Huang, X., Belongie, S.J.: Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1501–1510 (2017)
Google Scholar
Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1857–1865 (2017)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition, pp. 1156–1160 (2015)
Google Scholar
Yang, S., et al.: Controllable artistic text style transfer via shape-matching GAN. arXiv preprint arXiv:1905.01354 (2019)
Gomez, R., Biten, A.F., Gomez, L., et al.: Selective style transfer for text. arXiv preprint arXiv:1906.01466 (2019)
Yang, S., Liu, J., Lian, Z., Guo, Z.: Awesome typography: statistics-based text effects transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7464–7473 (2017)
Google Scholar
Yang, S., Liu, J., Yang, W., et al.: Context-aware text-based binary image stylization and synthesis. IEEE Trans. Image Process. 28(2), 952–964 (2018)
Article MathSciNet Google Scholar
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6924–6932 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Proceedings of the 28th annual Conference on Computer Graphics and Interactive Techniques, pp. 327–340. ACM (2001)
Google Scholar
Shih, Y., Paris, S., Barnes, C., Freeman, W.T., Durand, F.: Style transfer for headshot portraits. ACM Trans. Graphics (TOG) 33(4), 148 (2014)
Article Google Scholar
Shih, Y., Paris, S., Durand, F., Freeman, W.T.: Data-driven hallucination of different times of day from a single outdoor photo. ACM Trans. on Graphics (TOG) 32(6), 200 (2013)
Article Google Scholar
Frigo, O., Sabater, N., Delon, J., Hellier, P.: Split and match: example-based adaptive patch sampling for unsupervised style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 553–561 (2016)
Google Scholar
Liao, J., et al.: Visual attribute transfer through deep image analogy. arXiv preprint arXiv:1705.01088 (2017)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423 (2016)
Google Scholar
Li, C., Wand, M.: Combining markov random fields and convolutional neural networks for image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2479–2486 (2016)
Google Scholar
Li, C., Wand, M.: Precomputed real-time texture synthesis with markovian generative adversarial networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 702–716. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_43
Chapter Google Scholar
Wang, X., Oxholm, G., Zhang, D., Wang, Y.F.: Multimodal transfer: a hierarchical deep convolutional neural network for fast artistic style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5239–5247 (2017)
Google Scholar
Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., Yang, M.H.: Diversified texture synthesis with feed-forward networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3920–3928 (2017)
Google Scholar
Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., Yang, M.H. Universal style transfer via feature transforms. In: Advances in Neural Information Processing Systems, pp. 386–396 (2017)
Google Scholar
Chen, D., Yuan, L., Liao, J., Yu, N., Hua, G.: Stylebank: an explicit representation for neural image style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1897–1906 (2017)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5967–5976 (2017)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2242–2251 (2017)
Google Scholar
Chen, Y., Lai, Y.K., Liu, Y.J.: Cartoongan: generative adversarial networks for photo cartoonization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9465–9474 (2018)
Google Scholar
Azadi, S., Fisher, M., Kim, V.G., Wang, Z., Shechtman, E., Darrell, T.: Multi-content gan for few-shot font style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7564–7573 (2018)
Google Scholar
Yang, S., Liu, J., Wang, W., Guo, Z.: Tet-gan: text effects transfer via stylization and destylization. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1238–1245 (2019)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M., Hertzmann, A., Shechtman, E.: Controlling perceptual factors in neural style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3985–3993 (2017)
Google Scholar
Karatzas, D.: ICDAR 2015 competition on robust reading. In: ICDAR 2015 (2015)
Google Scholar
Yao, C., Bai, X., Liu, W.Y, Ma, Y., Tu, Z.W.: Detecting texts of arbitrary orientations in natural images. In Proceedings IEEE Conference Computer Vision and Pattern Recognition (2012)
Google Scholar
Liu, X.: Icdar 2019 robust reading challenge on reading chinese text on signboard (2019)
Google Scholar
Zhu, J.Y., Zhang, R., Pathak, D., et al.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Meituan-Dianping Group, Beijing, China
Gaojing Zhou, Lei Wang, Xi Liu, Yongsheng Zhou, Rui Zhang & Xiaolin Wei

Authors

Gaojing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yongsheng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Rui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolin Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gaojing Zhou .

Editor information

Editors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Xiang Bai
Autonomous University of Barcelona, Barcelona, Spain
Dimosthenis Karatzas
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, G., Wang, L., Liu, X., Zhou, Y., Zhang, R., Wei, X. (2020). A Method for Scene Text Style Transfer. In: Bai, X., Karatzas, D., Lopresti, D. (eds) Document Analysis Systems. DAS 2020. Lecture Notes in Computer Science(), vol 12116. Springer, Cham. https://doi.org/10.1007/978-3-030-57058-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-57058-3_39
Published: 14 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57057-6
Online ISBN: 978-3-030-57058-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)