Abstract
Scene text recognition has received increasing attention in the research community. Text in the wild often possesses irregular arrangements, which typically include perspective, curved, and oriented texts. Most of the existing methods do not work well for irregular text, especially for severely distorted text. In this paper, we propose a novel progressive rectification network (PRN) for irregular scene text recognition. Our PRN progressively rectifies the irregular text to a front-horizontal view and further boosts the recognition performance. The distortions are removed step by step by leveraging the observation that the intermediate rectified result provides good guidance for subsequent higher quality rectification. Additionally, by decomposing the rectification process into multiple procedures, the difficulty of each step is considerably mitigated. First, we specifically perform a rough rectification, and then adopt iterative refinement to gradually achieve optimal rectification. Additionally, to avoid the boundary damage problem in direct iterations, we design an envelope-refinement structure to maintain the integrity of the text during the iterative process. Instead of the rectified images, the text line envelope is tracked and continually refined, which implicitly models the transformation information. Then, the original input image is consistently utilized for transformation based on the refined envelope. In this manner, the original character information is preserved until the final transformation. These designs lead to optimal rectification to boost the performance of succeeding recognition. Extensive experiments on eight challenging datasets demonstrate the superiority of our method, especially on irregular benchmarks.
Similar content being viewed by others
References
Shi B G, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2298–2304
He P, Huang W L, Qiao Y, et al. Reading scene text in deep convolutional sequences. In: Proceedings of AAAI Conference on Artificial Intelligence, 2016. 3501–3508
Lee C Y, Osindero S. Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2231–2239
Cheng Z Z, Bai F, Xu Y L, et al. Focusing attention: towards accurate text recognition in natural images. In: Proceedings of IEEE International Conference on Computer Vision, 2017. 5086–5094
Shi B G, Wang X G, Lyu P Y, et al. Robust scene text recognition with automatic rectification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 4168–4176
Shi B G, Yang M K, Wang X G, et al. ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 2035–2048
Yang M K, Guan Y S, Liao M H, et al. Symmetry-constrained rectification network for scene text recognition. In: Proceedings of IEEE International Conference on Computer Vision, 2019
Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks. In: Proceedings of Advances in Neural Information Processing Systems, 2015. 2017–2025
Wang K, Babenko B, Belongie S. End-to-end scene text recognition. In: Proceedings of IEEE International Conference on Computer Vision, 2011. 1457–1464
Bissacco A, Cummins M, Netzer Y, et al. Photoocr: reading text in uncontrolled conditions. In: Proceedings of IEEE International Conference on Computer Vision, 2013. 785–792
Jaderberg M, Simonyan K, Vedaldi A, et al. Reading text in the wild with convolutional neural networks. Int J Comput Vis, 2016, 116: 1–20
Rodriguez-Serrano J A, Gordo A, Perronnin F. Label embedding: a frugal baseline for text recognition. Int J Comput Vis, 2015, 113: 193–207
Graves A, Fernández S, Gomez F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of International Conference on Machine Learning, 2006. 369–376
Bai F, Cheng Z Z, Niu Y, et al. Edit probability for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1508–1516
Fang S C, Xie H T, Zhang Z J, et al. Attention and language ensemble for scene text recognition with convolutional sequence modeling. In: Proceedings of ACM Conference on Multimedia, 2018. 248–256
Phan T Q, Shivakumara P, Tian S, et al. Recognizing text with perspective distortion in natural scenes. In: Proceedings of IEEE International Conference on Computer Vision, 2013. 569–576
Yang X, He D F, Zhou Z H, et al. Learning to read irregular text with attention mechanisms. In: Proceedings of International Joint Conference on Artificial Intelligence, 2017. 3280–3286
Liu W, Chen C F, Wong K Y K. Char-net: a character-aware neural network for distorted scene text recognition. In: Proceedings of AAAI Conference on Artificial Intelligence, 2018
Cheng Z Z, Liu X Y, Bai F, et al. AON: towards arbitrarily-oriented text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 5571–5579
Zhan F N, Lu S J. ESIR: end-to-end scene text recognition via iterative rectification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2059–2068
Chen J, Lian Z H, Wang Y Z, et al. Irregular scene text detection via attention guided border labeling. Sci China Inf Sci, 2019, 62: 220103
Bookstein F L. Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans Pattern Anal Machine Intell, 1989, 11: 567–585
Lin C-H, Lucey S. Inverse compositional spatial transformer networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2017. 2568–2576
He K, Zhang X Y, Ren S Q, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of IEEE International Conference on Computer Vision, 2015. 1026–1034
Saxe A M, McClelland J L, Ganguli S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. 2013. ArXiv: 1312.6120
Jaderberg M, Simonyan K, Vedaldi A, et al. Synthetic data and artificial neural networks for natural scene text recognition. 2014. ArXiv: 1406.2227
Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016. 2315–2324
Risnumawan A, Shivakumara P, Chan C S, et al. A robust arbitrary text detection system for natural scene images. Expert Syst Appl, 2014, 41: 8027–8048
Karatzas D, Gomez-Bigorda L, Nicolaou A, et al. ICDAR 2015 competition on robust reading. In: Proceedings of International Conference on Document Analysis and Recognition (ICDAR), 2015. 1156–1160
Mishra A, Alahari K, Jawahar C. Top-down and bottom-up cues for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2012. 2687–2694
Lucas S M, Panaretos A, Sosa L, et al. ICDAR 2003 robust reading competitions: entries, results, and future directions. Int J Document Anal Recogn, 2005, 7: 105–122
Karatzas D, Shafait F, Uchida S, et al. ICDAR 2013 robust reading competition. In: Proceedings of International Conference on Document Analysis and Recognition, 2013. 1484–1493
Ch’ng C K, Chan C S. Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of International Conference on Document Analysis and Recognition, 2017. 935–942
Zeiler M D. ADADELTA: an adaptive learning rate method. 2012. ArXiv: 1212.5701
Ketkar N. Introduction to pytorch. In: Deep Learning with Python. Berkeley: Apress, 2017. 195–208
Liu W, Chen C F, Wong K K. SAFE: scale aware feature encoder for scene text recognition. In: Proceedings of Asian Conference on Computer Vision, 2018. 196–211
Luo C J, Jin L W, Sun Z H. MORAN: a multi-object rectified attention network for scene text recognition. Pattern Recogn, 2019, 90: 109–118
Liu Y, Wang Z W, Jin H L, et al. Synthetically supervised feature learning for scene text recognition. In: Proceedings of European Conference on Computer Vision, 2018. 435–451
Lyu P Y, Yang Z C, Leng X H, et al. 2D attentional irregular scene text recognizer. 2019. ArXiv: 1906.05708
Liao M H, Zhang J, Wan Z Y, et al. Scene text recognition from two-dimensional perspective. In: Proceedings of AAAI Conference on Artificial Intelligence, 2019. 8714–8721
Li H, Wang P, Shen C H, et al. Show, attend and read: a simple and strong baseline for irregular text recognition. In: Proceedings of AAAI Conference on Artificial Intelligence, 2019. 8610–8617
Wang T, Wu D J, Coates A, et al. End-to-end text recognition with convolutional neural networks. In: Proceedings of International Conference on Pattern Recognition, 2012. 3304–3308
Yao C, Bai X, Shi B G, et al. Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014. 4042–4049
Jaderberg M, Vedaldi A, Zisserman A. Deep features for text spotting. In: Proceedings of European Conference on Computer Vision, 2014. 512–528
Jaderberg M, Simonyan K, Vedaldi A, et al. Deep structured output learning for unconstrained text recognition. 2014. ArXiv: 1412.5903
Liu W, Chen C F, Wong K K, et al. Star-net: a spatial attention residue network for scene text recognition. In: Proceedings of British Machine Vision Conference, 2016. 7
Wang J F, Hu X L. Gated recurrent convolution neural network for ocr. In: Proceedings of Neural Information Processing Systems, 2017. 334–343
Liu Z C, Li Y X, Ren F B, et al. Squeezedtext: a real-time scene text recognition by binary convolutional encoderdecoder network. In: Proceedings of AAAI Conference on Artificial Intelligence, 2018
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 61772527, 61806200).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gao, Y., Chen, Y., Wang, J. et al. Progressive rectification network for irregular text recognition. Sci. China Inf. Sci. 63, 120101 (2020). https://doi.org/10.1007/s11432-019-2710-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-019-2710-7