Skip to main content

Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks

  • Conference paper
  • First Online:
Advances in Computer Communication and Computational Sciences

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 759))

  • 522 Accesses

Abstract

Achieving accuracy for speech recognition has been a huge obstacle in the domain of Natural Language Processing and the model used predominantly for this is GMM-HMM. But, now with the boom of deep learning, it took primacy over the earlier model. With the advancement in the parallel processing and usage of the GPU power, deep learning has set forth results that have outperformed the GMM-HMM. This paper evaluates the performance of deep learning algorithm—Convolutional Neural network (CNN) on dataset comprising of audio (.wav) files capturing the recital of numerals from 0 to 100 in Punjabi language. The accuracy of the network is evaluated for two datasets that are with and without noise reduction. The model gives better results than the baseline GMM-HMM showing a reduction of error rate by 3.23% for data with noise reduction and by 3.76% for data without noise reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hamid, A.O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)

    Google Scholar 

  2. Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)

    Google Scholar 

  3. Jaitly, N., Nguyen, P., WSenior, A., Vanhoucke, V.: Application of pretrained deep neural networks to large vocabulary speech recognition. In: Interspeech, pp. 2578–2581 (2012)

    Google Scholar 

  4. Rumelhart, D., Hinton, G., Williams, R. et al.: Learning representations by back-propagating errors. Cognitive Modeling (1988)

    Google Scholar 

  5. Rumelhart, D., Hinton, G., Williams, R.: Learning internal representations by error propagation. Technical Report, California Univ San Diego La Jolla Inst for Cognitive Science (1985)

    Google Scholar 

  6. LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)

    Article  Google Scholar 

  7. Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  8. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

    Google Scholar 

  9. Piczak, K.: Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6 (2015)

    Google Scholar 

  10. Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849 (2017)

    Google Scholar 

  11. Yu, D., Seide, F., Li, G.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech, pp. 437–440 (2011)

    Google Scholar 

  12. Qian, Y., Bi, M., Tan, T., Yu, K.: Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang Process 24(12), 2263–2276 (2016)

    Article  Google Scholar 

  13. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)

    Google Scholar 

  14. Mittal, S., Verma, K.: Speaker independent isolated word speech to text conversion using HTK, Ph.D. thesis (2014)

    Google Scholar 

  15. Limonova, E., Sheshkus, A., Nikolaev, D.: Computational optimization of convolutional neural networks using separated filters architecture. Int. J. Appl. Eng. Res. 11(11), 7491–7494 (2016)

    Google Scholar 

  16. Palaz, D., Collobert, R., et al.: Analysis of CNN-based speech recognition system using raw speech as input. Technical Report, Idiap (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thakur Aditi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aditi, T., Karun, V. (2019). Speech Recognition of Punjabi Numerals Using Convolutional Neural Networks. In: Bhatia, S., Tiwari, S., Mishra, K., Trivedi, M. (eds) Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol 759. Springer, Singapore. https://doi.org/10.1007/978-981-13-0341-8_6

Download citation

Publish with us

Policies and ethics