Abstract
Archaic manuscripts are an important part of ancient civilization. Unfortunately, such documents are often affected by various age related degradations, which impinge their legibility and information contents, and destroy their original look. In general, these documents are composed of three layers of information: foreground text, background, and unwanted degradation in the form of patterns interfering with the main text. In this work, we are presenting a color space based image segmentation technique to separate and remove the bleed-through degradation in digital ancient manuscripts. The main theme is to improve their readability and restore their original aesthetic look. For each pixel, a feature vector is created using color spectral and spatial location information. A pixel based segmentation method using Gaussian Mixture Model (GMM) is employed, assuming that each feature vector corresponds to a Gaussian distribution. Based on this assumption, each pixel is supposed to be drawn from a mixture of Gaussian distribution, with unknown parameters. The Expectation-Maximization (EM) approach is then used to estimate the unknown GMM parameters. The appropriate class label for each pixel is then estimated using posterior probability and GMM parameters. Unlike other binarization based document restoration method where the focus is on text extraction, we are more interested in restoring the aesthetically pleasing look of the ancient documents.The experimental results validate the usefulness of proposed method in terms of successful bleed-through identification and removal, while preserving foreground-text and background information.
Similar content being viewed by others
Data Availability
The data used in the experimental section of this paper is publicly available on https://www.isos.dias.ie/.
References
Alata O, Quintard L (2009) Is there a best color space for color image characterization or representation based on multivariate gaussian mixture model? Comput Vis Image Underst 113:867–877
Blekas K, Likas A, Galatsanos N, Lagaris I (2005) A spatially constrained mixture model for image segmentation. IEEE Trans Neural Netw 16:494–498
Busin L, Vandenbroucke N, Macaire L (2008) Color spaces and image segmentation. Adv Imaging Electron Phys 151:65–168
Cai X, Chan R, Nikolova M, Zeng T (2017) A three stage approach for segmenting degraded color images: smoothing, lifting and thresholding (slat). J Sci Comput 72:1313–1332
Cappe E, Moulines O (2009) On-line expectation-maximization algorithm for latent data models. J Roy Stat Soc 71:593–613
Chaves-González J M, Vega-Rodríguez M A, Gómez-Pulido J A, Sánchez-Pérez J M (2010) Detecting skin in face recognition systems: a colour spaces study. Digit Signal Process 20:806–823
Cheng HD, Jiang XH, Sun Y, Xan J (2001) Color image segmentation: advances and prospects. Pattern Recogn 34:2259–2281
Drira F, Bourgeois F L, Emptoz H (2006) Restoring ink bleed-through degraded document images using a recursive unsupervised classification technique. Proc DAS, 38–49
Fadoua D, Bourgeois F L, Emptoz H (2006) Restoring ink bleed-through degraded document images using a recursive unsupervised classification technique. Document Analysis Systems VII, Lecture Notes in Computer Science, vol 3872 Springer 3872:27–38
Galerne B, Leclaire A (2017) Texture inpainting using efficient gaussian conditional simulation. SIAM J Imag Sci 10:1446–1474
GD V, C P (2018) Document binarization via multi-resolutional attention model with DRD loss. Pattern Recogn 81:224–239
Hanif M, Tonazzini A, Savino P, Salerno E (2018) Non-local sparse image inpainting for document bleed-through removal. J Imag 4:68
J Z, C S, F J, Y W, B X (2019) Document image binarization with cascaded generators of conditional generative adversarial networks. Pattern Recogn, 96
Jurio A, Pagola M, Galar M, Lopez-Molina C, Paternain D (2010) A comparison study of different color spaces in clustering based image segmentation. Inform Process Manag Uncertain Knowl-Based Syst 81:532–541
Leedham G, Varma S, Patankar A, Govindaraju V (2002) Separating text and background in degraded document images a comparison of global thresholding techniques for multi-stage thresholding. IEEE Trans Neural Netw, 244–249
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Computer vision and pattern recognition (CVPR), 3431–3440
Moghaddam R F, Cheriet M (2009) Low quality document image modeling and enhancement. Int J Doc Anal Recogn 11:183–201
Moghaddam R F, Cheriet M (2010) A variational approach to degraded document enhancement. IEEE Trans Pattern Anal Mach Intell 38:1347–1361
Orchard MT, Bouman CA (1991) Color quantization of images. IEEE Trans on Signal Process 39:2677–2698
Park SH, Yun ID, Lee SU (1998) Color image segmentation based on 3d clustering morphological approach. Pattern Recogn 31:1061–1076
Pastor-Pellicer J, Espa na Boquera S, Zamora-Martínez F, Afzal MZ, Castro-Bleda MJ (2015) Insights on the use of convolutional neural networks for document image binarization. International Work-conference on Artificial Neural Networks, Springer 1:115–126
Rani N S, Nair B J B, Chandrajith M, Kumar G H, Fortuny J (2022) Restoration of deteriorated text sections in ancient document images using a tri level semi-adaptive thresholding technique. Automatika 63:378–398. https://doi.org/10.1080/00051144.2022.2042462
Rotaru C, Graf T, Zhang J (2008) Color image segmentation in hsi space for automotive applications. J Real-Time Image Proc, 3
Rowley-Brooke R, Pitié F, Kokaram A C (2012) A ground truth bleed-through document image database. In: P Z, Buchanan G, Rasmussen E, Loizides F (eds) Theory and practice of digital libraries. LNCS, vol 7489. Springer, pp 185–196
Rowley-Brooke R, Pitié F, Kokaram A C (2013) A non-parametric framework for document bleed-through removal. Proc CVPR, 2954–2960
Ruiz-Ruiz G, Gómez-Gil J, Gracia L M N (2009) Testing different color spaces based on hue for the environmentally adaptive segmentation algorithm easa. Comput Electron Agric 68:88–96
Shi Z, Govindaraju V (2004) Historical document image enhancement using background light intensity normalization. Proc Int Conf Pattern Recogn, 473–476
Sun B, Li S, Zhang X-P, Sun J (2016) Blind bleed-through removal for scanned historical document image with conditional random fields. IEEE Trans Image Process, 5702–5712
Tensmeyer C, Martinez T (2020) Historical document image binarization: a review. SN Comput Sci 1:05
Tonazzini A, Bedini L, Salerno E (2004) Independent component analysis for document restoration. Int J Doc Anal Recogn 7:17–27
Tonazzini A, Bedini L, Salerno E (2006) A markov model for blind image separation by a mean-field em algorithm. IEEE Trans Image Process, 473–482
Tonazzini A, Gerace I, Martinelli F (2010) Multichannel blind separation and deconvolution of images for document analysis. IEEE Trans Image Process 19:912–925
Tonazzini A, Salerno E, Bedini L (2007) Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique. Int J Doc Anal Recogn 10:17–27
Vandenbroucke N, Macaire L, Postaire J-G (2003) Color image segmentation by pixel classification in an adapted hybrid color space Table 1. application to soccer image analysis. Comput Vis Image Underst 90:190–216
Wolf C (2010) Document ink bleed-through removal with two hidden markov random fields and a single observation field. IEEE Trans Pattern Anal Mach Intell, 431–447
X P, C W, H C (2019) Document binarization via multi-resolutional attention model with DRD loss. IEEE International conference on document analysis and recognition (ICDAR), 45–50
Yi H, Brown M S, Dong X (2010) User-assisted ink-bleed reduction. IEEE Trans Image Process 19:2646–2658
Zhang X, He C, Guo J (2020) Selective diffusion involving reaction for binarization of bleed-through document images. Appl Math Model 81:844–854
Funding
No funds or grants were received.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by ERCIM.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hanif, M., Tonazzini, A., Hussain, S.F. et al. Blind bleed-through removal in color ancient manuscripts. Multimed Tools Appl 82, 12321–12335 (2023). https://doi.org/10.1007/s11042-022-13755-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13755-6