A fast hierarchical method for multi-script and arbitrary oriented scene text extraction

Gomez, Lluis; Karatzas, Dimosthenis

doi:10.1007/s10032-016-0274-2

A fast hierarchical method for multi-script and arbitrary oriented scene text extraction

Original Paper
Published: 24 September 2016

Volume 19, pages 335–349, (2016)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Lluis Gomez¹ &
Dimosthenis Karatzas¹

566 Accesses
16 Citations
Explore all metrics

Abstract

Typography and layout lead to the hierarchical organization of text in words, text lines, paragraphs. This inherent structure is a key property of text in any script and language, which has nonetheless been minimally leveraged by existing scene text detection methods. This paper addresses the problem of text segmentation in natural scenes from a hierarchical perspective. Contrary to existing methods, we make explicit use of text structure, aiming directly to the detection of region groupings corresponding to text within a hierarchy produced by an agglomerative similarity clustering process over individual regions. We propose an optimal way to construct such an hierarchy introducing a feature space designed to produce text group hypotheses with high recall and a novel stopping rule combining a discriminative classifier and a probabilistic measure of group meaningfulness based on perceptual organization. Results obtained over four standard datasets, covering text in variable orientations and different languages, demonstrate that our algorithm, while being trained in a single mixed dataset, outperforms state-of-the-art methods in unconstrained scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Article 14 June 2021

Recent automatic text summarization techniques: a survey

Article 29 March 2016

References

Cao, F., Delon, J., Desolneux, A., Musé, P., Sur, F.: An a contrario approach to hierarchical clustering validity assessment. Technical report, INRIA (2004)
Chen, H., Tsai, S., Schroth, G., Chen, D., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Proceedings of ICIP, (2011)
Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: Proceedings of CVPR, (2004)
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D., Ng, A.: Text detection and character recognition in scene images with unsupervised feature learning. In: Proceedings of ICDAR, (2011)
Desolneux, A., Moisan, L., Morel, J.M.: A grouping principle and four applications. IEEE Trans. Pattern Anal. Mach. Intell. 25(4), 508–513 (2003)
Article MATH Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of CVPR, (2010)
Gomez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: Proceedings of ICDAR, (2013)
Hu, M.K.: Visual pattern recognition by moment invariants. IRE Trans. Inf. Theory 8(2), 179–187 (1962)
Article MATH Google Scholar
Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced mser trees. In: Proceedings of ECCV, (2014)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Proceedings of ECCV, (2014)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P., et al.: Icdar 2013 robust reading competition. In: Proceedings of ICDAR, (2013)
Kumar, D., Prasad, M., Ramakrishnan, A.: Multi-script robust reading competition in icdar 2013. In: Proceedings of Workshop on Multilingual OCR, (2013)
Kumar, D., Ramakrishnan, A.: Otcymist: otsu-canny minimal spanning tree for born-digital images. In: DAS, pp. 389–393. IEEE, (2012)
Lee, S., Cho, M.S., Jung, K., Kim, J.H.: Scene text extraction with edge constraint and text collinearity. In: Proceedings of ICPR, (2010)
Li, L., Yu, S., Zhong, L., Li, X.: Multilingual text detection with nonlinear neural network. Math. Probl. Eng. 2015, 1–7 (2015)
Google Scholar
Liang, G., Shivakumara, P., Lu, T., Tan, C.L.: Multi-spectral fusion based approach for arbitrarily oriented scene text detection in video images. IEEE Trans. Image Process. 24(11), 4488–4501 (2015)
Article MathSciNet Google Scholar
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R., et al.: ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3), 105–122 (2005)
Article Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Article Google Scholar
Matas, J., Zimmermann, K.: A new class of learnable detectors for categorisation. In: Scandinavian Conference on Image Analysis, (2005)
Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: Proceedings of ICDAR, (2013)
Milyaev, S., Barinova, O., Novikova, T., Kohli, P., Lempitsky, V.: Fast and accurate scene text understanding with image binarization and off-the-shelf OCR. IJDAR 18(2), 169–182 (2015)
Article Google Scholar
Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: T-HOG: an effective gradient-based descriptor for single line text regions. Pattern Recogn. 46(3), 1078–1090 (2013)
Article Google Scholar
Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: Proceedings of CVPR, (2012)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning, (2011)
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Proceedings of ACCV, (2010)
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: Proceedings of ICDAR, (2011)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of CVPR, (2012)
Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Proceedings of ECCV, (2012)
Pan, Y.F., Hou, X., Liu, C.L.: Text localization in natural scene images based on conditional random field. In: Proceedings of ICDAR, (2009)
Van de Sande, K.E., Uijlings, J.R., Gevers, T., Smeulders, A.W.: Segmentation as selective search for object recognition. In: ICCV. IEEE, (2011)
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999)
Article MATH Google Scholar
Shi, C., Wang, C., Xiao, B., Gao, S., Hu, J.: End-to-end scene text recognition using tree-structured models. Pattern Recogn. 47(9), 2853–2866 (2014)
Article Google Scholar
Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: Proceedings of ICCV, (2011)
Wang, K., Belongie, S.: Word spotting in the wild. In: Proceedings of ECCV, (2010)
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: Proceedings of ICPR, (2012)
Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4), 280–296 (2006)
Article Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: Proceedings of CVPR, (2012)
Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: A learned multi-scale representation for scene text recognition. In: Proceedings of CVPR, (2014)
Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)
Article Google Scholar
Yin, X.C., Pei, W.Y., Zhang, J., Hao, H.W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1930–1937 (2015)
Article Google Scholar
Zhang, J., Kasturi, R.: Text detection using edge gradient and graph spectrum. In: Proceedings of ICPR, (2010)

Download references

Acknowledgments

This project was supported by the Spanish project TIN2011-24631 the fellowship RYC-2009-05031 and the Catalan government scholarship 2013FI1126.

Author information

Authors and Affiliations

Computer Vision Center, Universitat Autonoma de Barcelona, Edifici O, Campus UAB, 08193, Bellaterra, Barcelona, Spain
Lluis Gomez & Dimosthenis Karatzas

Authors

Lluis Gomez
View author publications
You can also search for this author in PubMed Google Scholar
Dimosthenis Karatzas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lluis Gomez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gomez, L., Karatzas, D. A fast hierarchical method for multi-script and arbitrary oriented scene text extraction. IJDAR 19, 335–349 (2016). https://doi.org/10.1007/s10032-016-0274-2

Download citation

Received: 05 October 2015
Revised: 02 September 2016
Accepted: 06 September 2016
Published: 24 September 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10032-016-0274-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast hierarchical method for multi-script and arbitrary oriented scene text extraction

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Recent automatic text summarization techniques: a survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast hierarchical method for multi-script and arbitrary oriented scene text extraction

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

A Systematic Survey on CAPTCHA Recognition: Types, Creation and Breaking Techniques

Recent automatic text summarization techniques: a survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation