Skip to main content
Log in

Access by content to handwritten archive documents: generic document recognition method and platform for annotations

  • Original Paper
  • Published:
International Journal of Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

This paper presents annotations needed for handwritten archive document retrieval by content. We propose two complementary ways of producing these annotations: automatically by using document image analysis and collectively by using the Internet and manual input by users. A platform for managing these annotations is presented as well as examples of automatic annotations on civil status registers, military forms (tested on 165,000 pages) and naturalization decrees, using a generic method for structured document recognition and handwriting recognition on names. Examples of collective annotations built on automatic annotations are also given. This platform is already open to the public in the reading room of the new building of the Archives départementales des Yvelines and on the Internet. About 1,450,000 images of civil status registers are available for collective annotation as well as 105,000 pages of military forms with automatic annotation of handwritten names.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adam, S., Rigamonti, M., Clavier, E., Trupin, E., Ogier, J.-M., Tombre, K., Gardes, J.: Docmining: A document analysis system builder. In: Marinai, S., Dengel, A. (eds.) Document Analysis Systems VI, 6th International Workshop, DAS 2004, vol. 3163. Lecture Notes in Computer Science, pp. 472–483. Florence, Italy, September (2004) Springer

  2. Amano, A., Asada, N.: Graph grammar based analysis system of complex table form document. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 2, pp. 916–920. Edinburgh, Scotland (2003)

  3. Brainerd W.S. (1969). Tree generating regular systems. Inf. Control 14: 217–231

    Article  MATH  MathSciNet  Google Scholar 

  4. Clavier, E., Masini, G., Delalandre, M., Rigamonti, M., Tombre, K., Gardes, J.: Docmining: a cooperative platform for heterogeneous document interpretation according to user-defined scenarios. In: Lladós, J., Kwon, Y.-B. (eds.) Graphics Recognition: Recent Advances and Perspectives, vol. 3088 of LNCS, pp. 13–24. Springer, Heidelberg (2004)

  5. Coüasnon, B.: Dmos: a generic document recognition method, application to an automatic generator of musical scores, mathematical formulae and table structures recognition systems. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 215–220, Seattle (2001)

  6. Coüasnon, B., Brisset, P., Stephan, I.: Using logic programming languages for optical music recognition. In: International Conference on the Practical Application of Prolog, pp. 115–134. Paris, France (1995)

  7. Coüasnon, B., Camillerapp, J.: Using grammars to segment and recognize music scores. In: Spitz, L., Dengel, A. (eds.) Document Analysis Systems. World Scientific, Singapore (1995)

  8. Coüasnon, B., Pasquer, L.: A real-world evaluation of a generic document recognition method applied to a military form of the 19th century. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 779–783. Seattle, USA (2001)

  9. Coüasnon, B.: Dealing with noise in dmos, a generic method for structured document recognition: an example on a complete grammar. In: Lladós, J., Kwon, Y.-B. (eds.) Graphics Recognition: Recent Advances and Perspectives, vol. 3088 of LNCS, pp. 38–49. Springer, Heidelberg (2004)

  10. Describing, retrieving photos~using RDF, and HTTP. W3C Note (2002) http://www.w3.org/TR/photo-rdf/

  11. Esposito F., Malerba D. and Lisi F.A. (2000). Machine learning for intelligent processing of printed documents. J. Intell. Inf. Syst. 14(2–3): 175–198

    Article  Google Scholar 

  12. Feder J. (1971). Plex languages. Inf. Sci. 3: 225–241

    Article  MathSciNet  Google Scholar 

  13. Pascal Garcia and Bertrand Coüasnon. Using a generic document recognition method for methematical formulae recognition. In: Graphics Recognition: Algorithms and Applications, vol. 2390 of LNCS, pp. 236–244. Springer, Heidelberg (2002)

  14. Grbavec, A., Blostein, D.: Mathematics recognition using graph rewriting. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 1, pp. 417–421. Montréal (1995)

  15. Hori, O., Doermann, D.S.: Robust table-form structure analysis based on box-driven reasoning. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 1, pp. 218–221. Montréal (1995)

  16. Hu, J., Kashi, R., Lopresti, D., Wilfong, G.: System for understanding and reformulating tables. In: Fourth IAPR International Workshop on Document Analysis Systems, pp. 361–372. Rio de Janeiro, Brazil (2000)

  17. Hunter, J., Zhan, Z.: An indexing and querying system for online images based on the png format and embedded metadata. In: Proceedings of the ARLIS/ANZ Conference, Brisbane (1999)

  18. Hurst, M.: A constraint-based approach to table structure derivation. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 2, pp. 910–915. Edinburgh (2003)

  19. Hurst, M., Douglas, S.: Layout and language: preliminary investigations in recognizing the structure of tables. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 2, pp. 1043–1047. Ulm, Germany (1997)

  20. Kahan, J., Koivunen, M.-R., Prud’Hommeaux, E., Swick, R.R.: Annotea: an open rdf infrastructure for shared web annotations. In: Proceedings of the WWW10 International Conference, Hong Kong (2001)

  21. Kieninger, T., Dengel, A.: Applying the t-recs table recognition system to the business letter domain. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 518–522. Seattle (2001)

  22. Klein, B., Dengel, A.R., Fordan, A.: smartfix: an adaptive system for document analysis and understanding. In: Dengel, A., Junker, M., Weisbecker, A. (eds.) Reading and Learning: Adaptive Content Recognition, vol. 2956 of LNCS, pp. 166–186. Springer, Heidelberg (2004)

  23. Klein, B., Gökkus, S., Kieninger, T., Dengel, A.: Three approaches to “industrial” table spotting. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 513–517. Seattle (2001)

  24. Lebourgeois, F., Emptoz, H., Trinh, E., Duong, J.: Networking digital document images. In: Proceedings of the 6th ICDAR, pp. 379–383. Seattle (2001)

  25. Levenshtein V.I. (1966). Binary codes capable of correction deletions, insertions and reversals. Sov. Phys. Dokladay 10: 707–710

    MathSciNet  Google Scholar 

  26. Lopresti, D., Nagy, G.: A tabular survey of automated table processing. In: Atul~K. Chhabra and Dov Dori, (eds.) Graphics Recognition, Recent Advances, vol. 1941 of Lecture Notes in Computer Science, pp. 93–120. Springer, Heidelberg (2000)

  27. Manmatha, R., Croft, W.B.: Word spotting: Indexing handwritten archives. In: Maybury, M. (ed.) Intelligent Multi-media Information Retrieval Collection. AAAI/MIT Press (1997)

  28. Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Document Recognition and Retreval X, (Proceedings of SPIE/IST), vol. 5010. Santa Clara, California (2003)

  29. Middendorf, M., Peust, J., Schacht, C.: A component-based framework for recognition systems. In: Dengel, A., Junker, M., Weisbecker, A. (eds.) Reading and Learning: Adaptive Content Recognition, vol. 2956 of LNCS, pp. 153–165. Springer, Heidelberg (2004)

  30. Mühlberger, G: Automated digitisation of printed material for everyone: the metadata engine project. RLG DigiNews 6(3), (2002)

  31. Nielson, H.E., Barrett, W.A.: Consensus-based table form recognition. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 2, pp. 906–910. Edinburgh (2003)

  32. Pereira F.C.N. and Warren D.H.D. (1980). Definite clauses for language analysis. Artific. Intell. 13: 231–278

    Article  MATH  MathSciNet  Google Scholar 

  33. Pfaltz, J.L., Rosenfeld, A.: Web grammars. In: Proceedings of the First International Joint Conference on Artificial Intelligence, pp. 609–619. Washington (1969)

  34. Phelps, T.A., Wilensky, R.: Multivalent annotations. In: Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries, Pisa (1997)

  35. Poulain d’ Andecy, V., Camillerapp, J., Leplumey, I.: Kalman filtering for segment detection: application to music scores analysis. In: ICPR, 12th International Conference on Pattern Recognition (IAPR), vol. 1, pp. 301–305. Jérysalem, Israel (1994)

  36. Ramel, J.-Y., Crucianu, M., Vincent, N., Faure, C.: Detection, extraction and representation of tables. In: ICDAR, International Conference on Document Analysis and Recognition, vol. 1, pp. 374–378. Edinburgh (2003)

  37. Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 521–527. Madison (2003)

  38. Resource Description~Framework (RDF): Model and syntax specification. W3C Recommandation (1999) http://www.w3.org/ TR/REC-rdf-syntax/

  39. Schäfer, H., Thomas~Bayer, T., Kreuzer, K., Miletzki, U., Schambach, M.-P., Schulte-Austum, M.: How postal address readers are made adaptive. In: Dengel, A., Junker, M., Weisbecker, A. (eds.) Reading and Learning: Adaptive Content Recognition, vol. 2956 of LNCS, pp. 187–215. Springer, Heidelberg (2004)

  40. Taylor S.L., Fritzson R. and Pastor J.A. (1992). Extraction of data from preprinted forms. Mach. Vis. Appl. 5(3): 211–222

    Google Scholar 

  41. Tomai, C.I., Zhang, B., Govindaraju, V.: Transcript mapping for historic handwritten document images. In: Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition, pp. 413–418. Niagara-on-the-Lake (2002)

  42. Vinciarelli, A., Bengio, S., Bunke, H.: Offline recognition of large vocabulary cursive handwritten text. In: Proceedings of the 7th International Conference on Document Analysis and Recognition, vol. 1, pp. 1101–1105. Edinburgh (2003)

  43. Wang, Y., Phillips, I.T., Haralick, R.M.: Table detection via probability optimization. In: Hu, J., Lopresti, D., Kashi, R. (eds.) DAS 2002, LNCS 2423, pp. 272–282. Springer, Heidelberg (2002)

  44. Watanabe, T., Luo, Q., Sugie, N.: Toward a practical document understanding of table-form documents: its framework and knowledge representation. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 510–515, Tsukuba Science City (1993)

  45. Xingyuan, L., Doerman, D., Oh, W., Gao, W.: A robust method for unknown forms analysis. In: ICDAR, International Conference on Document Analysis and Recognition, pp. 531–534. Bangalore, (1999)

  46. Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: models, observations, transformations, and inferences. Int. J. Doc. Anal. Recog. IJDAR 7(1), (2004)

  47. Cropped military forms: Archives départementales de la Mayenne. http://www.lamayenne.fr follow Archives départementales then Archives en ligne and Registres matricules d’incorporation militaire.

  48. Demo of the platform on civil status registers: http://imadoc-ar. irisa.fr/EC

  49. Demo of the platform on military forms with automatic access by handwritten last names: http://imadoc-ar.irisa.fr/RM

  50. Demo of the platform on naturalization decrees with a fast leaf-through on handwritten last names: http://imadoc-ar.irisa.fr/ Decrets

  51. Platform on military forms with automatic access by handwritten last names: Archives départementales des Yvelines. http://www.archives.yvelines.fr follow Matricules militaires (Plateforme d’annotation)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bertrand Coüasnon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Coüasnon, B., Camillerapp, J. & Leplumey, I. Access by content to handwritten archive documents: generic document recognition method and platform for annotations. IJDAR 9, 223–242 (2007). https://doi.org/10.1007/s10032-007-0044-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-007-0044-2

Keywords

Navigation