Skip to main content

Segmentation-Free Keyword Retrieval in Historical Document Images

  • Conference paper
  • First Online:
Image Analysis and Recognition (ICIAR 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8814))

Included in the following conference series:

Abstract

We present a segmentation-free method to retrieve keywords from degraded historical documents. The proposed method works directly on the gray scale representation and does not require any pre-processing to enhance document images. The document images are subdivided into overlapping patches of varying sizes, where each patch is described by the bag-of-visual-words descriptor. The obtained patch descriptors are hashed into several hash tables using kernelized locality-sensitive hashing scheme for efficient retrieval. In such a scheme the search for a keyword is reduced to a small fraction of the patches from the appropriate entries in the hash tables. Since we need to capture the handwriting variations and the availability of historical documents is limited, we synthesize a small number of samples from the given query to improve the results of the retrieval process.

We have tested our approach on historical document images in Hebrew from the Cairo Genizah collection, and obtained impressive results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Efficient Exemplar Word Spotting. In: British Machine Vision Conference, pp. 67.1–67.11 (2012)

    Google Scholar 

  2. Biller, O., Asi, A., Kedem, K., El-Sana, J., Dinstein, I.: WebGT: An Interactive Web-based System for Historical Document Ground Truth Generation. In: 12th International Conference on Document Analysis and Recognition, pp. 305–308 (2013)

    Google Scholar 

  3. Biller, O., Kedem, K., Dinstein, I., El-Sana, J.: Evolution Maps for Connected Components in Text Documents. In: International Conference on Frontiers in Handwriting Recognition, pp. 405–410 (2012)

    Google Scholar 

  4. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual Categorization with Bags of Keypoints. In: Workshop on Statistical Learning in Computer Vision. vol. 1, pp. 1–2 (2004)

    Google Scholar 

  5. Dovgalecs, V., Burnett, A., Tranouez, P., Nicolas, S., Heutte, L.: Spot It! Finding Words and Patterns in Historical Documents. In: 12th International Conference on Document Analysis and Recognition, pp. 1039–1043 (2013)

    Google Scholar 

  6. Fischer, A., Keller, A., Frinken, V., Bunke, H.: Lexicon-free handwritten word spotting using character HMMs. Pattern Recognition Letters 33(7), 934–942 (2012)

    Article  Google Scholar 

  7. Gatos, B., Pratikakis, I.: Segmentation-free Word Spotting in Historical Printed Documents. In: 10th International Conference on Document Analysis and Recognition, pp. 271–275 (2009)

    Google Scholar 

  8. Gionis, A., Indyk, P., Motwani, R.: Similarity Search in High Dimensions via Hashing. In: VLDB, vol. 99, pp. 518–529 (1999)

    Google Scholar 

  9. Kieu, V., Visani, M., Journet, N., Domenger, J., Mullot, R.: A character degradation model for grayscale ancient document images. In: 21st International Conference on Pattern Recognition, pp. 685–688 (2012)

    Google Scholar 

  10. Kolcz, A., Alspector, J., Augusteijn, M., Carlson, R., Popescu, G.: A Line-Oriented Approach to Word Spotting in Handwritten Documents. Pattern Analysis and Applications 3, 153–168 (2000)

    Article  Google Scholar 

  11. Kulis, B., Grauman, K.: Kernelized Locality-Sensitive Hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(6), 1092–1104 (2012)

    Article  Google Scholar 

  12. Kumar, A., Jawahar, C.V., Manmatha, R.: Efficient Search in Document Image Collections. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007, Part I. LNCS, vol. 4843, pp. 586–595. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Lavrenko, V., Rath, T., Manmatha, R.: Holistic Word Recognition for Handwritten Historical Documents. In: Workshop on Document Image Analysis for Libraries, pp. 278–287 (2004)

    Google Scholar 

  14. Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178 (2006)

    Google Scholar 

  15. Manmatha, R., Croft, W.: Word Spotting: Indexing Handwritten Archives. In: Intelligent Multimedia Information Retrieval Collection, pp. 43–64 (1997)

    Google Scholar 

  16. Rabaev, I., Biller, O., El-Sana, J., Kedem, K., Dinstein, I.: Case Study in Hebrew Character Searching. In: 11th InternationalConference on Document Analysis and Recognition, pp. 1080–1084 (2011)

    Google Scholar 

  17. Rath, T., Manmatha, R.: Word Image Matching Using Dynamic Time Warping. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 521–527 (2003)

    Google Scholar 

  18. Rusinol, M., Aldavert, D., Toledo, R., Lladós, J.: Browsing Heterogeneous Document Collections by a Segmentation-free Word Spotting Method. In: 11th International Conference on Document Analysis and Recognition, pp. 63–67 (2011)

    Google Scholar 

  19. Saabni, R., Bronstein, A.: Fast Keyword Searching Using ‘BoostMap’ Based Embedding. In: International Conference on Frontiers in Handwriting Recognition, pp. 734–739 (2012)

    Google Scholar 

  20. Yang, Y., Newsam, S.: Spatial pyramid co-occurrence for image classification. In: IEEE International Conference on Computer Vision, pp. 1465–1472 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irina Rabaev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Rabaev, I., Dinstein, I., El-Sana, J., Kedem, K. (2014). Segmentation-Free Keyword Retrieval in Historical Document Images. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2014. Lecture Notes in Computer Science(), vol 8814. Springer, Cham. https://doi.org/10.1007/978-3-319-11758-4_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11758-4_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11757-7

  • Online ISBN: 978-3-319-11758-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics