Abstract
Document image understanding has attracted the research community since two and half decades. Multilingual documents in the database require an automatic classification technique for browsing and sorting. This chapter provides an introduction to language-based classification of documents and review about the methods used for language detection. This chapter also proposes a segmentation-free technique for classification of document images based on the language used. A hybrid texture feature-extraction scheme using stationary wavelet transform (SWT) and histogram of oriented gradients (HOG) is presented. The multi-class support vector machine (SVM) is employed for classification of documents. The presented method is investigated on a database of 1006 document images consisting of Kannada, Telugu, Marathi, Hindi, and English language. It has shown better results compared with existing techniques. An average detection rate of 87.02% is obtained using the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
G. Nagy, Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)
S. Chaudhury, G. Harit, S. Madnani, R.B. Shet, Identification of scripts of Indian languages by combining trainable classifiers, in ICVGIP 2000 (2000), pp. 20–22
A. Kulkarni, P. Upparamani, R. Kadkol, P. Tergundi, Script identification from multilingual text documents. Int. J. Adv. Res. Comput. Commun. Eng. 4(6), 15–19 (2015)
M.C. Padma, P.A. Vijaya, Script identification from trilingual documents using profile based features. Int. J. Comput. Sci. Appl. 7(4), 16–33 (2010)
U. Pal, B.B. Chaudhuri, Automatic identification of english, chinese, arabic, devnagari and bangla script line, in Proceedings of Sixth International Conference on Document Analysis and Recognition (2001), pp. 790–794
G.G. Rajput, H.B. Anita, Handwritten script recognition using DCT and wavelet features at block level. Int. J. Comput. Appl., Special issue on RTIPPR (3), 158–163 (2010)
S.M. Obaidullah, A. Mondal, N. Das, K. Roy, Script identification from printed Indian document images and performance evaluation using different classifiers. Appl. Comput. Intell. Soft Comput. 2014, 1–12 (2014)
M.S. Shirdhonkar, M.B. Kokare (2010). Discrimination between printed and handwritten text in documents, in IJCA Special Issue on Recent Trends in Image Processing and Pattern Recognition, pp. 131–134
R. Pardeshi, B.B. Chaudhuri, M. Hangarge, K.C. Santosh, Automatic handwritten Indian scripts identification, in IEEE 14th International Conference on Frontiers in Handwriting Recognition (September 2014), pp. 375–380
C.L. Tan, W. Huang, S.Y. Sung, Z. Yu, Y. Xu, Text retrieval from document images based on word shape analysis. Appl. Intell. 18(3), 257–270 (2003)
A.S. Wanchoo, P. Yadav, A. Anuse, A survey on Devanagari character recognition for Indian postal system automation. Int. J. Appl. Eng. Res. 11(6), 4529–4536 (2016)
P. Sahare, S.B. Dhok, Script identification algorithms: A survey. Int. J. Multimed. Inf. Retr. 6(3), 211–232 (2017)
U.D. Dixit, M.S. Shirdhonkar, A survey on document image analysis and retrieval system. Int. J. Cybern. Informat. 4(2), 259–270 (2015)
S.A.A.A. Arani, E. Kabir, R. Ebrahimpour, Handwritten Farsi word recognition using NN-based fusion of HMM classifiers with different types of features. Int. J. Image Graph. 19(1), 1–21 (2019)
N. Bi, J. Chen, J. Tan, The handwritten Chinese character recognition uses convolutional neural networks with the GoogLeNet. Intern. J. Pattern Recognit. Artif. Intell. 33(11), 1–12 (2019)
C. Djeddi, I. Siddiqi, L. Souici-Meslati, A. Ennaji, Text-independent writer recognition using multi-script handwritten texts. Pattern Recognit. Lett. 34(10), 1196–1202 (2013)
U.D. Dixit, M.S. Shirdhonkar, Fingerprint-based document image retrieval. Int. J. Image Graph. 19(2), 1–17 (2019)
P.P. Roy, A.K. Bhunia, A. Das, P. Dey, U. Pal, HMM-based Indic handwritten word recognition using zone segmentation. Pattern Recognit. 60, 1057–1075 (2016)
U.D. Dixit, M.S. Shirdhonkar, Preprocessing framework for document image analysis. Int. J. Adv. Netw. Appl. 10(4), 3911–3918 (2019)
A. Bultheel, Learning to swim in a sea of wavelets. Bull. Belg. Math. Soc. Simon Stevin 2(1), 1–45 (1995)
S.G. Chang, B. Yu, M. Vetterli, Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Process. 9(9), 1532–1546 (2000)
A.N. Akansu, Y. Liu, On-signal decomposition techniques. Opt. Eng. 30(7), 912–921 (1991)
M.J. Shensa, The discrete wavelet transform: wedding the a trous and Mallat algorithms. IEEE Trans. Signal Process. 40(10), 2464–2482 (1992)
M.V. Tazebay, A.N. Akansu, Progressive optimality in hierarchical filter banks, in Proceedings of 1st International Conference on Image Processing, vol. 1 (Nov 1994), pp. 825–829
M.V. Tazebay, A.N. Akansu, Adaptive subband transforms in time-frequency excisers for DSSS communications systems. IEEE Trans. Signal Process. 43(11), 2776–2782 (1995)
M. Holschneider, R. Kronland-Martinet, J. Morlet, P. Tchamitchian, A real-time algorithm for signal analysis with the help of the wavelet transform, in Wavelets, (Springer, Berlin, Heidelberg, 1990), pp. 286–297
Y. Zhang, S. Wang, Y. Huo, L. Wu, A. Liu, Feature extraction of brain MRI by stationary wavelet transform and its applications. J. Biol. Syst. 18, 115–132 (2010)
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in International Conference on Computer Vision & Pattern Recognition (CVPR’05), vol. 1 (2005), pp. 886–893
Y. Zhao, Z. Song, X. Wu, Hand detection using multi-resolution HOG features, in 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO) (Dec 2012), pp. 1715–1720
Y. Zhao, Y. Zhang, R. Cheng, D. Wei, G. Li, An enhanced histogram of oriented gradients for pedestrian detection. IEEE Intell. Transp. Syst. Mag. 7(3), 29–38 (2015)
X.Y. Li, Z.X. Lin, Face recognition based on HOG and fast PCA algorithm, in The Euro-China Conference on Intelligent Data Analysis and Applications, (Springer, Spain, 2017), pp. 10–21
J. Pan, Y. Zhuang, S. Fong, The impact of data normalization on stock market prediction: using SVM and technical indicators, in International Conference on Soft Computing in Data Science, (Springer, Malaysia, 2016), pp. 72–88
V. Vapnik, The Nature of Statistical Learning Theory (Springer Science & Business Media, 2013)
Mäenpaa Topi, Matti Pietikäinen. (2005) Texture analysis with local binary patterns, Handbook of Pattern Recognition and Computer Vision, (pp. 197–216), Singapore: World Scientific Publishing
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Dixit, U.D., Shirdhonkar, M.S. (2019). Language-Based Classification of Document Images Using Hybrid Texture Features. In: Sinha, G. (eds) Advances in Biometrics. Springer, Cham. https://doi.org/10.1007/978-3-030-30436-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-30436-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30435-5
Online ISBN: 978-3-030-30436-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)