Skip to main content

Scene Text Recognition in the Wild with Motion Deblurring Using Deep Networks

  • Conference paper
  • First Online:
Computer Vision and Image Processing (CVIP 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1378))

Included in the following conference series:

  • 1074 Accesses

Abstract

In this paper, the problem of text detection and recognition in videos has been addressed. We address two major issues that make it difficult to extract information in a video captured by a moving vehicle. Video captured by a moving vehicle contains a lot of blurs caused by motion which is one of the major issues preventing accurate recognition of text. The second major issue is the orientation of the text being detected, which may not be in the same plane. We propose a novel end-to-end pipeline consisting of deep networks. Our pipeline consists of a fully convolution network to detect text, Generative Adversarial Network to remove motion blur, a rectification network which makes use of Thin Spline Transformations and a Spatial Transform network to handle text which is not straight i.e. perspective and curved, and a recognition network to recognize the text. We only deblur the region around text boxes instead of complete images. We also track the text boxes in each frame to avoid re-recognition of text in consecutive frames. This significantly improves the performance of the system, as proved by higher classification scores achieved as compared to state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR (2), pp. 366–373 (2004)

    Google Scholar 

  2. Wang, T., et al.: End-to-end text recognition with convolutional neural networks. In: ICPR, pp. 3304–3308. IEEE Computer Society (2012)

    Google Scholar 

  3. Yin, X.-C., et al.: Robust Text Detection in Natural Scene Images. CoRR. abs/1301.2628 (2013)

    Google Scholar 

  4. Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2016)

    Article  Google Scholar 

  5. Jaderberg, M., et al.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)

    Article  MathSciNet  Google Scholar 

  6. Susan, S., Devi, K.M.R.: Text area segmentation from document images by novel adaptive thresholding and template matching using texture cues. Pattern Anal. Appl. 23(2), 869–881 (2020)

    Article  MathSciNet  Google Scholar 

  7. Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651. IEEE Computer Society (2017)

    Google Scholar 

  8. Epshtein, B., et al.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970. IEEE (2010)

    Google Scholar 

  9. Shi, C., et al.: Scene text recognition using part-based tree-structured character detection. In: CVPR, pp. 2961–2968. IEEE Computer Society (2013)

    Google Scholar 

  10. Almazán, J., et al.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)

    Article  Google Scholar 

  11. Ciresan, D.C., et al.: Multi-column deep neural network for traffic sign classification. Neural Netw. 32, 333–338 (2012)

    Article  Google Scholar 

  12. Krizhevsky, A., et al.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  13. Antani, S.K., et al.: Robust extraction of text in video. In: ICPR, pp. 1831–1834. IEEE Computer Society (2000)

    Google Scholar 

  14. Kupyn, O., et al.: DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. CoRR. abs/1711.07064 (2017)

    Google Scholar 

  15. Shi, B., et al.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019)

    Article  Google Scholar 

  16. Mishra, A., et al.: Top-down and bottom-up cues for scene text recognition. In: CVPR, pp. 2687–2694. IEEE Computer Society (2012)

    Google Scholar 

  17. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34

    Chapter  Google Scholar 

  18. Yang, X., et al.: Learning to read irregular text with attention mechanisms. In: Sierra, C. (ed.) IJCAI, pp. 3280–3286. ijcai.org (2017)

    Google Scholar 

  19. Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)

    Article  Google Scholar 

  20. Graves, A., et al.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural nets. In: Proceedings of the International Conference on Machine Learning, ICML 2006 (2006)

    Google Scholar 

  21. Nah, S., et al.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: CVPR, pp. 257–265. IEEE Computer Society (2017)

    Google Scholar 

  22. Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493. IEEE Computer Society (2013)

    Google Scholar 

  23. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160. IEEE Computer Society (2015)

    Google Scholar 

  24. Yao, C., et al.: Incidental Scene Text Understanding: Recent Progresses on ICDAR 2015 Robust Reading Competition Challenge 4. CoRR. abs/1511.09207 (2015)

    Google Scholar 

  25. Lucas, S.M., et al.: ICDAR 2003 robust reading competitions. In: ICDAR, pp. 682–687. IEEE Computer Society (2003)

    Google Scholar 

  26. Goodfellow, I.J., et al.: Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks (2013)

    Google Scholar 

  27. https://opencv.org/

  28. http://dlib.net/

  29. Danelljan, M., et al.: Accurate scale estimation for robust visual tracking. In: Valstar, M.F., et al. (eds.) BMVC. BMVA Press (2014)

    Google Scholar 

  30. https://cloud.google.com/

  31. Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. http://arxiv.org/abs/1412.6980 (2014)

  32. Jaderberg, M., et al.: Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition. CoRR. abs/1406.2227 (2014)

    Google Scholar 

  33. i Bigorda, L.G., Karatzas, D.: TextProposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recogn. 70, 60–74 (2017)

    Google Scholar 

  34. Cheng, Z., et al.: Focusing attention: towards accurate text recognition in natural images. In: ICCV. pp. 5086–5094. IEEE Computer Society (2017)

    Google Scholar 

  35. Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: CVPR, pp. 2231–2239. IEEE Computer Society (2016)

    Google Scholar 

  36. Shi, B., et al.: Robust Scene Text Recognition with Automatic Rectification. CoRR. abs/1603.03915 (2016)

    Google Scholar 

  37. Shi, B., et al.: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. CoRR. abs/1507.05717 (2015)

    Google Scholar 

  38. Jaderberg, M., et al.: Deep structured output learning for unconstrained text recognition. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015)

    Google Scholar 

  39. Mishra, A., et al.: Scene text recognition using higher order language priors. In: Bowden, R., et al. (eds.) BMVC, pp. 1–11. BMVA Press (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sukhad Anand .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anand, S., Susan, S., Aggarwal, S., Aggarwal, S., Singla, R. (2021). Scene Text Recognition in the Wild with Motion Deblurring Using Deep Networks. In: Singh, S.K., Roy, P., Raman, B., Nagabhushan, P. (eds) Computer Vision and Image Processing. CVIP 2020. Communications in Computer and Information Science, vol 1378. Springer, Singapore. https://doi.org/10.1007/978-981-16-1103-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-1103-2_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-1102-5

  • Online ISBN: 978-981-16-1103-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics