Abstract
In a freestyle handwritten text-line, sometimes words are inserted using a caret symbol (\(^\wedge \)) for corrections/annotations. Such insertions create fluctuations in the reading sequence of words. In this paper, we aim to line-up the words of a text-line, so that it can assist the OCR engine. Previous text-line segmentation techniques in the literature have scarcely addressed this issue. Here, the task undertaken is formulated as a path planning problem, and a novel multi-agent hierarchical reinforcement learning-based architecture solution is proposed. As a matter of fact, no linguistic knowledge is used here. Experimentation of the proposed solution architecture has been conducted on English and Bengali offline handwriting, which yielded some interesting results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Grüning, T., et al.: A two-stage method for text line detection in historical documents. IJDAR 22, 285–302 (2019)
Survey, A., Sulem, L.L., Zahour, A., Taconet, B.: Text line segmentation of historical documents. IJDAR 9, 123–138 (2007)
Surinta, O., et al.: A* path planning for line segmentation of handwritten documents. In: ICFHR, pp. 175–180 (2014)
Li, X.Y., et al.: Script-independent text line segmentation in freestyle handwritten documents. IEEE TPAMI 30(8), 1313–1329 (2008)
Arulkumaran, K., et al.: Deep reinforcement learning: a brief survey. IEEE Sig. Process. Mag. 34(6), 26–38 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018). ISBN: 9780262039246
Wilber: GIMP 2.10.22 Released (2020). Online: gimp.org. Accessed 3 May 2021
Marti, U., Bunke, H.: The IAM-database: an English sentence database for off-line handwriting recognition. IJDAR 5, 39–46 (2002)
Alaei, A., Pal, U., Nagabhushan, P.: Dataset and ground truth for handwritten text in four different scripts. IJPRAI 26(4), 1253001 (2012)
Berliac, Y. F.: The Promise of Hierarchical Reinforcement Learning. The Gradient (2019)
Wierstra, D., Foerster, A., Peters, J., Schmidhuber, J.: Solving deep memory POMDPs with recurrent policy gradients. In: ICANN, pp. 697–706 (2007)
Badrinarayanan, V., et al.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE TPAMI 39(12), 2481–2495 (2017)
Zhang, A., et al.: Dive into Deep Learning (2020). Online: d2l.ai. Accessed 3 May 2021
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. ICML 37, 448–456 (2015)
Misra, D.: Mish: a self regularized non-monotonic activation function. In: Paper # 928, BMVC 2020 (2020)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Wang, Z., et al.: Dueling network architectures for deep reinforcement learning. ICML 48, 1995–2003 (2016)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
Wandell, B.A.: Foundations of Vision. Sinauer Asso. Inc. (1995). ISBN: 9780878938537
Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: NIPS, pp. 1243–1251 (2010)
He, K., et al.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734 (2014)
Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS Workshop on Deep Learning (2014)
Mnih, V., et al.: Recurrent models of visual attention. In: NIPS, pp. 2204–2212 (2014)
Sutton, R.S., et al.: Policy gradient methods for reinforcement learning with function approximation. In: NIPS, pp. 1057–1063 (1999)
Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the Theory of Neural Computation. CRC Press, Boca Raton (1991). https://doi.org/10.1201/9780429499661
Botchkarev, A.: Performance metrics (error measures) in machine learning regression, forecasting and prognostics: properties and typology arXiv:1809.03006 (2018)
Stamatopoulos, N., et al.: ICDAR 2013 handwriting segmentation contest. In: ICDAR, pp. 1402–1406 (2013)
Chaudhuri, B.B., Adak, C.: An approach for detecting and cleaning of struck-out hand-written text. Pattern Recogn. 61, 282–294 (2017)
Almageed, W.A., et al.: Page rule-line removal using linear subspaces in monochromatic handwritten Arabic documents. In: ICDAR, pp. 768–772 (2009)
Acknowledgment
All the people who contributed to generating the database are gratefully acknowledged. The authors also heartily thank all the consulted linguistic and handwriting experts.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Adak, C., Chaudhuri, B.B., Lin, CT., Blumenstein, M. (2021). Text-line-up: Don’t Worry About the Caret. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-86334-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86333-3
Online ISBN: 978-3-030-86334-0
eBook Packages: Computer ScienceComputer Science (R0)