Recognition of Historical Characters by Combination of Method Detecting Character in Photo Image of Document and Method Separating Block to Characters

Sichao, Liao; Miwa, Hiroyoshi

doi:10.1007/978-3-030-39746-3_48

Liao Sichao⁵ &
Hiroyoshi Miwa⁵

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 47))

Included in the following conference series:

International Conference on Emerging Internetworking, Data & Web Technologies

893 Accesses

Abstract

There are vast amount of historical documents written in cursive writing style in Japan. However, characters written by the style cannot be read by modern people, because the style was taught no longer. Therefore, an efficient method to convert historical characters into modern characters automatically is required. Especially, since every page in a Japanese historical document is stored by a photo image, it is necessary to automatically recognize all characters in a photo image. However, it is difficult to recognize each historical characters separately, because they are written connected and because there are many types of shape of characters. In this paper, we propose a method combining a method using deep learning to detect characters in a photo image and a method separating a block into characters. The remained parts that cannot be recognized by the former method are separated into characters by the latter method. Thus, it is expected that the recognition ratio is improved. We evaluate the performance of the proposed algorithm by using photo images of actual documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shoten, I.: General Catalog of National Books. Iwanami Shoten (2002)
Google Scholar
Shibayama, M., et. al.: Research on Higher Accuracy Document Character Recognition Systems (in Japanese), Grants-in-Aid for Scientific Research (B) (1) report on research results, pp. 33–49 (2005)
Google Scholar
Yamamoto, S., Osawa, T.: Labor saving for reprinting Japanese rare classical books: the development of the new method for OCR technology including Kana and Kanji characters in cursive style. Inf. Manag. 58(11), 819–826 (2016)
Google Scholar
Hayasaka, T., Ohno, W., Kato, Y., Yamamoto, K.: Recognition of Hentaigana by deep learning and trial production of WWW application (in Japanese). In: Proceedings of IPSJ Symposium of Humanities and Computer Symposium, pp. 7–12 (2016)
Google Scholar
The Dataset of Kuzushiji (Open Data Center for Humanities). http://codh.rois.ac.jp/pmjt
Tarin, C., Mikel, B., Asanobu, K., Alex, L., Kazuaki, Y., David, H.: Deep learning for classical Japanese literature. In: Proceedings of 2018 Workshop on Machine Learning for Creativity and Design (Thirty-second Conference on Neural Information Processing Systems), 3 December 2018
Google Scholar
https://sites.google.com/view/alcon2017/prmu
Nguyen, H.T., Ly, N.T., Nguyen, K.C., Nguyen, C.T., Nakagawa, M.: Attempts to recognize anomalously deformed Kana in Japanese historical documents. In: Proceedings of 4th International Workshop on Historical Document Imaging and Processing, pp. 31–36, 10–11 November 2017
Google Scholar
Clanuwat, T., Alex, L., Asanobu, K.: End-to-end pre-modern Japanese character (Kuzushiji) spotting with deep learning. In: Proceedings of IPSJ SIG Computers and the Humanities, pp. 15–20 (2018)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI). LNCS, vol. 9351, pp. 234–241. Springer (2015)
Google Scholar
Sichao, L., Miwa, H.: Algorithm using deep learning for recognition of Japanese historical characters in photo image of historical book. In: Proceedings of INCoS, Oita, Japan, pp. 5–7, September 2019
Google Scholar
Kaggle. https://www.kaggle.com/c/kuzushiji-recognition
Kingama, D.P., Ba, J.L.: A method for stochastic optimization. In: Proceedings of ICLR, San Diego, 7–9 May 2015
Google Scholar
Wu, Y., He, K.: Group normalization. In: Proceedings of ECCV, Munich, Germany, 8–14 September 2018
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, San Diego, 7–9 May 2015
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Science and Technology, Kwansei Gakuin University, 2-1 Gakuen, Sanda-shi, Hyogo, Japan
Liao Sichao & Hiroyoshi Miwa

Authors

Liao Sichao
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyoshi Miwa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroyoshi Miwa .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
Innovation Center for Educational Resources, University Library, Kyushu University, Fukuoka, Japan
Yoshihiro Okada
Department of Electrical Engineering and Information Technology, University of Naples "Frederico II", Naples, Italy
Flora Amato

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sichao, L., Miwa, H. (2020). Recognition of Historical Characters by Combination of Method Detecting Character in Photo Image of Document and Method Separating Block to Characters. In: Barolli, L., Okada, Y., Amato, F. (eds) Advances in Internet, Data and Web Technologies. EIDWT 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 47. Springer, Cham. https://doi.org/10.1007/978-3-030-39746-3_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-39746-3_48
Published: 31 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39745-6
Online ISBN: 978-3-030-39746-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics