Abstract
Achieving accuracy for speech recognition has been a huge obstacle in the domain of Natural Language Processing and the model used predominantly for this is GMM-HMM. But, now with the boom of deep learning, it took primacy over the earlier model. With the advancement in the parallel processing and usage of the GPU power, deep learning has set forth results that have outperformed the GMM-HMM. This paper evaluates the performance of deep learning algorithm—Convolutional Neural network (CNN) on dataset comprising of audio (.wav) files capturing the recital of numerals from 0 to 100 in Punjabi language. The accuracy of the network is evaluated for two datasets that are with and without noise reduction. The model gives better results than the baseline GMM-HMM showing a reduction of error rate by 3.23% for data with noise reduction and by 3.76% for data without noise reduction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hamid, A.O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Jaitly, N., Nguyen, P., WSenior, A., Vanhoucke, V.: Application of pretrained deep neural networks to large vocabulary speech recognition. In: Interspeech, pp. 2578–2581 (2012)
Rumelhart, D., Hinton, G., Williams, R. et al.: Learning representations by back-propagating errors. Cognitive Modeling (1988)
Rumelhart, D., Hinton, G., Williams, R.: Learning internal representations by error propagation. Technical Report, California Univ San Diego La Jolla Inst for Cognitive Science (1985)
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Piczak, K.: Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2015)
Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849 (2017)
Yu, D., Seide, F., Li, G.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech, pp. 437–440 (2011)
Qian, Y., Bi, M., Tan, T., Yu, K.: Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang Process 24(12), 2263–2276 (2016)
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Mittal, S., Verma, K.: Speaker independent isolated word speech to text conversion using HTK, Ph.D. thesis (2014)
Limonova, E., Sheshkus, A., Nikolaev, D.: Computational optimization of convolutional neural networks using separated filters architecture. Int. J. Appl. Eng. Res. 11(11), 7491–7494 (2016)
Palaz, D., Collobert, R., et al.: Analysis of CNN-based speech recognition system using raw speech as input. Technical Report, Idiap (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Aditi, T., Karun, V. (2019). Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks. In: Bhatia, S., Tiwari, S., Mishra, K., Trivedi, M. (eds) Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol 759. Springer, Singapore. https://doi.org/10.1007/978-981-13-0341-8_6
Download citation
DOI: https://doi.org/10.1007/978-981-13-0341-8_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0340-1
Online ISBN: 978-981-13-0341-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)