Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks

Aditi, Thakur; Karun, Verma

doi:10.1007/978-981-13-0341-8_6

Thakur Aditi¹⁸ &
Verma Karun¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 759))

522 Accesses

Abstract

Achieving accuracy for speech recognition has been a huge obstacle in the domain of Natural Language Processing and the model used predominantly for this is GMM-HMM. But, now with the boom of deep learning, it took primacy over the earlier model. With the advancement in the parallel processing and usage of the GPU power, deep learning has set forth results that have outperformed the GMM-HMM. This paper evaluates the performance of deep learning algorithm—Convolutional Neural network (CNN) on dataset comprising of audio (.wav) files capturing the recital of numerals from 0 to 100 in Punjabi language. The accuracy of the network is evaluated for two datasets that are with and without noise reduction. The model gives better results than the baseline GMM-HMM showing a reduction of error rate by 3.23% for data with noise reduction and by 3.76% for data without noise reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hamid, A.O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Google Scholar
Jaitly, N., Nguyen, P., WSenior, A., Vanhoucke, V.: Application of pretrained deep neural networks to large vocabulary speech recognition. In: Interspeech, pp. 2578–2581 (2012)
Google Scholar
Rumelhart, D., Hinton, G., Williams, R. et al.: Learning representations by back-propagating errors. Cognitive Modeling (1988)
Google Scholar
Rumelhart, D., Hinton, G., Williams, R.: Learning internal representations by error propagation. Technical Report, California Univ San Diego La Jolla Inst for Cognitive Science (1985)
Google Scholar
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar
Piczak, K.: Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2015)
Google Scholar
Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849 (2017)
Google Scholar
Yu, D., Seide, F., Li, G.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech, pp. 437–440 (2011)
Google Scholar
Qian, Y., Bi, M., Tan, T., Yu, K.: Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang Process 24(12), 2263–2276 (2016)
Article Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Google Scholar
Mittal, S., Verma, K.: Speaker independent isolated word speech to text conversion using HTK, Ph.D. thesis (2014)
Google Scholar
Limonova, E., Sheshkus, A., Nikolaev, D.: Computational optimization of convolutional neural networks using separated filters architecture. Int. J. Appl. Eng. Res. 11(11), 7491–7494 (2016)
Google Scholar
Palaz, D., Collobert, R., et al.: Analysis of CNN-based speech recognition system using raw speech as input. Technical Report, Idiap (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering Department, Thapar University, Patiala, Punjab, India
Thakur Aditi & Verma Karun

Authors

Thakur Aditi
View author publications
You can also search for this author in PubMed Google Scholar
Verma Karun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thakur Aditi .

Editor information

Editors and Affiliations

Department of Computer Science, University of Missouri, Columbia, Missouri, USA
Sanjiv K. Bhatia
Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Shailesh Tiwari
Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Allahabad, Uttar Pradesh, India
Krishn K. Mishra
Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Munesh C. Trivedi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aditi, T., Karun, V. (2019). Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks. In: Bhatia, S., Tiwari, S., Mishra, K., Trivedi, M. (eds) Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol 759. Springer, Singapore. https://doi.org/10.1007/978-981-13-0341-8_6

Download citation

DOI: https://doi.org/10.1007/978-981-13-0341-8_6
Published: 23 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0340-1
Online ISBN: 978-981-13-0341-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics