Skip to main content

VML-HP: Hebrew Paleography Dataset

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12824))

Included in the following conference series:

Abstract

This paper presents a public dataset, VML-HP, for Hebrew paleography analysis. The VML-HP dataset consists of 537 document page images with labels of 15 script sub-types. Ground truth is manually created by a Hebrew paleographer at a page level. In addition, we propose a patch generation tool for extracting patches that contain an approximately equal number of text lines no matter the variety of font sizes. The VML-HP dataset contains a train set and two test sets. The first is a typical test set, and the second is a blind test set for evaluating algorithms in a more challenging setting. We have evaluated several deep learning classifiers on both of the test sets. The results show that convolutional networks can classify Hebrew script sub-types on a typical test set with accuracy much higher than the accuracy on the blind test.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://sfardata.nli.org.il/.

References

  1. Abdalhaleem, A., Barakat, B.K., El-Sana, J.: Case study: fine writing style classification using Siamese neural network. In: 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), pp. 62–66. IEEE (2018)

    Google Scholar 

  2. Beit-Arié, M.: Hebrew codicology. Tentative Typology of Technical Practices Employed in Hebrew Dated Medieval Manuscripts, Jerusalem (1981)

    Google Scholar 

  3. Beit-Arié, M., Engel, E.: Specimens of mediaeval Hebrew scripts, vol. 3. Israel Academy of Sciences and Humanities (1987, 2002, 2017)

    Google Scholar 

  4. Christlein, V., Bernecker, D., Maier, A., Angelopoulou, E.: Offline writer identification using convolutional neural network activation features. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 540–552. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24947-6_45

    Chapter  Google Scholar 

  5. Christlein, V., Gropp, M., Fiel, S., Maier, A.: Unsupervised feature learning for writer identification and writer retrieval. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 991–997. IEEE (2017)

    Google Scholar 

  6. Clausner, C., Pletschacher, S., Antonacopoulos, A.: Aletheia-an advanced document layout and text ground-truthing system for production environments. In: ICDAR, pp. 48–52. IEEE (2011)

    Google Scholar 

  7. Cloppet, F., Eglin, V., Helias-Baron, M., Kieu, C., Vincent, N., Stutzmann, D.: ICDAR 2017 competition on the classification of medieval handwritings in Latin script. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1371–1376. IEEE (2017)

    Google Scholar 

  8. Cloppet, F., Eglin, V., Stutzmann, D., Vincent, N., et al.: ICFHR 2016 competition on the classification of medieval handwritings in Latin script. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 590–595. IEEE (2016)

    Google Scholar 

  9. Dhali, M.A., Jansen, C.N., de Wit, J.W., Schomaker, L.: Feature-extraction methods for historical manuscript dating based on writing style development. Pattern Recogn. Lett. 131, 413–420 (2020)

    Article  Google Scholar 

  10. Fiel, S., Sablatnig, R.: Writer identification and writer retrieval using the fisher vector on visual vocabularies. In: 12th International Conference on Document Analysis and Recognition, pp. 545–549. IEEE (2013)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  12. He, S., Samara, P., Burgers, J., Schomaker, L.: Discovering visual element evolutions for historical document dating. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7–12. IEEE (2016)

    Google Scholar 

  13. He, S., Samara, P., Burgers, J., Schomaker, L.: Historical manuscript dating based on temporal pattern codebook. Comput. Vis. Image Underst. 152, 167–175 (2016)

    Article  Google Scholar 

  14. He, S., Sammara, P., Burgers, J., Schomaker, L.: Towards style-based dating of historical documents. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 265–270. IEEE (2014)

    Google Scholar 

  15. Hosoe, M., Yamada, T., Kato, K., Yamamoto, K.: Offline text-independent writer identification based on writer-independent model using conditional autoencoder. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 441–446. IEEE (2018)

    Google Scholar 

  16. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  17. Keglevic, M., Fiel, S., Sablatnig, R.: Learning features for writer retrieval and identification using triplet CNNs. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 211–216. IEEE (2018)

    Google Scholar 

  18. Pletschacher, S., Antonacopoulos, A.: The page (page analysis and ground-truth elements) format framework. In: ICPR, pp. 257–260. IEEE (2010)

    Google Scholar 

  19. Richler, B.: Hebrew manuscripts in the Vatican library: catalogue. Hebrew manuscripts in the Vatican Library, pp. 1–790 (2008)

    Google Scholar 

  20. Richler, B., Beit-Arié, M.: Hebrew manuscripts in the biblioteca palatina in parma: catalogue; palaeographical and codicological descriptions (2011)

    Google Scholar 

  21. Savitzky, A., Golay, M.J.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36(8), 1627–1639 (1964)

    Article  Google Scholar 

  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  23. Sirat, C.: Hebrew Manuscripts of the Middle Ages. Cambridge University Press, Cambridge (2002)

    Google Scholar 

  24. Studer, L., et al.: A comprehensive study of ImageNet pre-training for historical document image analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 720–725. IEEE (2019)

    Google Scholar 

  25. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  26. Wolf, L., Potikha, L., Dershowitz, N., Shweka, R., Choueka, Y.: Computerized paleography: tools for historical manuscripts. In: 2011 18th IEEE International Conference on Image Processing, pp. 3545–3548. IEEE (2011)

    Google Scholar 

  27. Yardeni, A., et al.: The Book of Hebrew Script: History, Palaeography, Script Styles, Calligraphy & Design. Carta Jerusalem, Jerusalem (1997)

    Google Scholar 

Download references

Acknowledgment

This research was partially supported by The Frankel Center for Computer Science at Ben-Gurion University. The participation of Dr. Vasyutinsky Shapira in this project is funded by Israeli Ministery of Science, Technology and Space, Yuval Ne’eman scholarship n. 3-16784.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Droby .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Droby, A., Kurar Barakat, B., Vasyutinsky Shapira, D., Rabaev, I., El-Sana, J. (2021). VML-HP: Hebrew Paleography Dataset. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86337-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86336-4

  • Online ISBN: 978-3-030-86337-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics