Scene Text Recognition in the Wild with Motion Deblurring Using Deep Networks

Anand, Sukhad; Susan, Seba; Aggarwal, Shreshtha; Aggarwal, Shubham; Singla, Rajat

doi:10.1007/978-981-16-1103-2_9

Sukhad Anand⁹,
Seba Susan⁹,
Shreshtha Aggarwal⁹,
Shubham Aggarwal⁹ &
…
Rajat Singla⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1378))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

1074 Accesses

Abstract

In this paper, the problem of text detection and recognition in videos has been addressed. We address two major issues that make it difficult to extract information in a video captured by a moving vehicle. Video captured by a moving vehicle contains a lot of blurs caused by motion which is one of the major issues preventing accurate recognition of text. The second major issue is the orientation of the text being detected, which may not be in the same plane. We propose a novel end-to-end pipeline consisting of deep networks. Our pipeline consists of a fully convolution network to detect text, Generative Adversarial Network to remove motion blur, a rectification network which makes use of Thin Spline Transformations and a Spatial Transform network to handle text which is not straight i.e. perspective and curved, and a recognition network to recognize the text. We only deblur the region around text boxes instead of complete images. We also track the text boxes in each frame to avoid re-recognition of text in consecutive frames. This significantly improves the performance of the system, as proved by higher classification scores achieved as compared to state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR (2), pp. 366–373 (2004)
Google Scholar
Wang, T., et al.: End-to-end text recognition with convolutional neural networks. In: ICPR, pp. 3304–3308. IEEE Computer Society (2012)
Google Scholar
Yin, X.-C., et al.: Robust Text Detection in Natural Scene Images. CoRR. abs/1301.2628 (2013)
Google Scholar
Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2016)
Article Google Scholar
Jaderberg, M., et al.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)
Article MathSciNet Google Scholar
Susan, S., Devi, K.M.R.: Text area segmentation from document images by novel adaptive thresholding and template matching using texture cues. Pattern Anal. Appl. 23(2), 869–881 (2020)
Article MathSciNet Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651. IEEE Computer Society (2017)
Google Scholar
Epshtein, B., et al.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970. IEEE (2010)
Google Scholar
Shi, C., et al.: Scene text recognition using part-based tree-structured character detection. In: CVPR, pp. 2961–2968. IEEE Computer Society (2013)
Google Scholar
Almazán, J., et al.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)
Article Google Scholar
Ciresan, D.C., et al.: Multi-column deep neural network for traffic sign classification. Neural Netw. 32, 333–338 (2012)
Article Google Scholar
Krizhevsky, A., et al.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Antani, S.K., et al.: Robust extraction of text in video. In: ICPR, pp. 1831–1834. IEEE Computer Society (2000)
Google Scholar
Kupyn, O., et al.: DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. CoRR. abs/1711.07064 (2017)
Google Scholar
Shi, B., et al.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019)
Article Google Scholar
Mishra, A., et al.: Top-down and bottom-up cues for scene text recognition. In: CVPR, pp. 2687–2694. IEEE Computer Society (2012)
Google Scholar
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Chapter Google Scholar
Yang, X., et al.: Learning to read irregular text with attention mechanisms. In: Sierra, C. (ed.) IJCAI, pp. 3280–3286. ijcai.org (2017)
Google Scholar
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)
Article Google Scholar
Graves, A., et al.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural nets. In: Proceedings of the International Conference on Machine Learning, ICML 2006 (2006)
Google Scholar
Nah, S., et al.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: CVPR, pp. 257–265. IEEE Computer Society (2017)
Google Scholar
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493. IEEE Computer Society (2013)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160. IEEE Computer Society (2015)
Google Scholar
Yao, C., et al.: Incidental Scene Text Understanding: Recent Progresses on ICDAR 2015 Robust Reading Competition Challenge 4. CoRR. abs/1511.09207 (2015)
Google Scholar
Lucas, S.M., et al.: ICDAR 2003 robust reading competitions. In: ICDAR, pp. 682–687. IEEE Computer Society (2003)
Google Scholar
Goodfellow, I.J., et al.: Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks (2013)
Google Scholar
https://opencv.org/
http://dlib.net/
Danelljan, M., et al.: Accurate scale estimation for robust visual tracking. In: Valstar, M.F., et al. (eds.) BMVC. BMVA Press (2014)
Google Scholar
https://cloud.google.com/
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 (2014)
Jaderberg, M., et al.: Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition. CoRR. abs/1406.2227 (2014)
Google Scholar
i Bigorda, L.G., Karatzas, D.: TextProposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recogn. 70, 60–74 (2017)
Google Scholar
Cheng, Z., et al.: Focusing attention: towards accurate text recognition in natural images. In: ICCV. pp. 5086–5094. IEEE Computer Society (2017)
Google Scholar
Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: CVPR, pp. 2231–2239. IEEE Computer Society (2016)
Google Scholar
Shi, B., et al.: Robust Scene Text Recognition with Automatic Rectification. CoRR. abs/1603.03915 (2016)
Google Scholar
Shi, B., et al.: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. CoRR. abs/1507.05717 (2015)
Google Scholar
Jaderberg, M., et al.: Deep structured output learning for unconstrained text recognition. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015)
Google Scholar
Mishra, A., et al.: Scene text recognition using higher order language priors. In: Bowden, R., et al. (eds.) BMVC, pp. 1–11. BMVA Press (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Delhi Technological University, Bawana Road, New Delhi, India
Sukhad Anand, Seba Susan, Shreshtha Aggarwal, Shubham Aggarwal & Rajat Singla

Authors

Sukhad Anand
View author publications
You can also search for this author in PubMed Google Scholar
Seba Susan
View author publications
You can also search for this author in PubMed Google Scholar
Shreshtha Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Shubham Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Rajat Singla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sukhad Anand .

Editor information

Editors and Affiliations

Indian Institute of Information Technology Allahabad, Prayagraj, India
Satish Kumar Singh
Indian Institute of Technology Roorkee, Roorkee, India
Partha Roy
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Information Technology Allahabad, Prayagraj, India
P. Nagabhushan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anand, S., Susan, S., Aggarwal, S., Aggarwal, S., Singla, R. (2021). Scene Text Recognition in the Wild with Motion Deblurring Using Deep Networks. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1378. Springer, Singapore. https://doi.org/10.1007/978-981-16-1103-2_9

Download citation

DOI: https://doi.org/10.1007/978-981-16-1103-2_9
Published: 26 March 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1102-5
Online ISBN: 978-981-16-1103-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics