Segmentation of Handwritten Characters for Digitalizing Korean Historical Documents

Kim, Min Soo; Cho, Kyu Tae; Kwag, Hee Kue; Kim, Jin Hyung

doi:10.1007/978-3-540-28640-0_11

Min Soo Kim¹⁸,
Kyu Tae Cho¹⁸,
Hee Kue Kwag¹⁹ &
…
Jin Hyung Kim¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3163))

Included in the following conference series:

International Workshop on Document Analysis Systems

1405 Accesses
11 Citations

Abstract

The historical documents are valuable cultural heritages and sources for the study of history, social aspect and life at that time. The digitalization of historical documents aims to provide instant access to the archives for the researchers and the public, who had been endowed with limited chance due to maintenance reasons. However, most of these documents are not only written by hand in ancient Chinese characters, but also have complex page layouts. As a result, it is not easy to utilize conventional OCR(optical character recognition) system about historical documents even if OCR has received the most attention for several years as a key module in digitalization. We have been developing OCR-based digitalization system of historical documents for years. In this paper, we propose dedicated segmentation and rejection methods for OCR of Korean historical documents. Proposed recognition-based segmentation method uses geometric feature and context information with Viterbi algorithm. Rejection method uses Mahalanobis distance and posterior probability for solving out-of-class problem, especially. Some promising experimental results are reported.

Download to read the full chapter text

Chapter PDF

Text Line Segmentation for Medieval Devnagari Manuscript

Shirorekha based character segmentation for medieval handwritten Devnagari manuscript

Article 15 April 2021

Character Segmentation from Offline Handwritten Gujarati Script Documents

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Hara, S.: OCR for CJK classical texts preliminary examination. In: Proc. Pacific Neighborhood Consortium(PNC) Annual Meeting, Taipei, Taiwan, pp. 11–17 (2000)
Google Scholar
Lixin, Z., Ruwei, D.: Off-line handwritten Chinese characterrecognition with nonlinear pre-classification. In: Proc. Inc. Conf. On Multimodal Interfaces (ICMI 2000), pp. 473–479 (2000)
Google Scholar
Kim, M.S., Jang, M.D., Choi, H.I., Rhee, T.H., Kim, J.H.: Digitalizing Scheme of Handwritten Hanja Historical Documents. In: Proc. Document Image Analysis of Libraries(DIAL 2004), Palo Alto, California, pp. 321–327 (2004)
Google Scholar
Tung, C.H., Lee, H.J., Tsai, J.Y.: Multi-stage precandidate selection in handwritten Chinese character recognition system. Pattern Recognition 27(8), 1093–1102 (1994)
Article Google Scholar
Tong, L.C., Tan, S.L.: Speeding up Chinese character recognition in an automatic document reading system. Pattern Recognition 31(11), 1601–1612 (1998)
Article Google Scholar
Chen, Q., Zhen, L.: Word Segmentation in Handwritten Chinese Text Image Based on Component Clustering Techniques. In: Proc. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering, vol. 1, pp. 435–440 (2002)
Google Scholar
Zhao, S., Chi, Z., Shi, P., Yan, H.: Two-stage segmentation of unconstrained handwritten Chinese characters. Pattern Recognition 36, 145–156 (2003)
Article MATH Google Scholar
Tseng, Y.H., Lee, H.J.: Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm. Pattern Recognition Letters 20, 791–806 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

CS Div, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701, Republic of Korea
Min Soo Kim, Kyu Tae Cho & Jin Hyung Kim
Dongbang SnC Co., Ltd., 10th Floor, BaekSang Bldg., Gwanhun-dong, Jongno-gu, Seoul, Korea
Hee Kue Kwag

Authors

Min Soo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kyu Tae Cho
View author publications
You can also search for this author in PubMed Google Scholar
Hee Kue Kwag
View author publications
You can also search for this author in PubMed Google Scholar
Jin Hyung Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Sistemi e Informatica, Università di Firenze, Via di Santa Marta 3, 50139, Firenze, Italy
Simone Marinai
Knowledge Management Department, German Research Center for Artificial Intelligence (DFKI) GmbH, Kaiserslautern, Germany
Andreas R. Dengel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, M.S., Cho, K.T., Kwag, H.K., Kim, J.H. (2004). Segmentation of Handwritten Characters for Digitalizing Korean Historical Documents. In: Marinai, S., Dengel, A.R. (eds) Document Analysis Systems VI. DAS 2004. Lecture Notes in Computer Science, vol 3163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28640-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-28640-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23060-1
Online ISBN: 978-3-540-28640-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Segmentation of Handwritten Characters for Digitalizing Korean Historical Documents

Abstract

Chapter PDF

Similar content being viewed by others

Text Line Segmentation for Medieval Devnagari Manuscript

Shirorekha based character segmentation for medieval handwritten Devnagari manuscript

Character Segmentation from Offline Handwritten Gujarati Script Documents

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Segmentation of Handwritten Characters for Digitalizing Korean Historical Documents

Abstract

Chapter PDF

Similar content being viewed by others

Text Line Segmentation for Medieval Devnagari Manuscript

Shirorekha based character segmentation for medieval handwritten Devnagari manuscript

Character Segmentation from Offline Handwritten Gujarati Script Documents

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation