Skip to main content
Log in

Continuous Tamil Speech Recognition technique under non stationary noisy environments

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In the last few years, the need for Continuous Speech Recognition system in Tamil language has been increased widely. In this research work, efficient Continuous Tamil Speech Recognition (CTSR) technique is proposed under non stationary noisy environments. This research work consists of two stages such as speech enhancement and modelling phase. In this, the modified Modulation Magnitude Estimation based Spectral Subtraction with Chi-Square Distribution based Noise Estimation (SS–NE) algorithm is proposed to enhance the noisy Tamil speech signal under various non-stationary noise environments. In order to extract the speech segments from the continuous speech, further the enhanced speech signal is segmented through the combination of short-time signal energy and spectral centroid features of the signal. In this work, 26 mel frequency cepstral coefficients per frame are found as optimal values and they are considered as acoustic feature vectors for each frame. In this research work, the Fuzzy C-Means (FCM) clustering is used in order to cluster the extracted feature vectors into discrete symbols. From the evaluation results, it is found that the optimal number of clusters ‘C’ as 5. Finally, Tamil speech from various speakers is recognized using Expectation Maximization Gaussian Mixture Model (EM-GMM) with 16 component densities under continuous measurements of labelled features from FCM clustering techniques in order to reduce the word error rate. From the simulated results, it is observed that the proposed FCM with EM-GMM model for CTSR improves the recognition accuracy from 1.2 to 4.4% when compared to the existing algorithms under different noisy environments by reducing the WER from 1.6 to 5.47%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Al-Alaoui, M. A., Al-Kanj, L., Azar, J., & Yaacoub, E. (2008). Speech recognition using artificial neural networks and hidden Markov models. IEEE Multidisciplinary Engineering Education Magazine, 3(3), 77–86.

    Google Scholar 

  • Atlas, L., Li, Q., & Thompson, J. (2004). Homomorphic modulation spectra. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 2, pp. ii761–ii764.

  • Benesty, J., & Huang, Y. (2003). Adaptive signal processing: Applications to real-world problems. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120.

    Article  Google Scholar 

  • Cappé, O. (1994). Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Transactions on Speech and Audio Processing, 2(2), 345–349.

    Article  Google Scholar 

  • Chattopadhyay, S., Pratihar, D. K., & Sarkar, S. C. D. (2011). A comparative study of fuzzy C-means algorithm and entropy-based fuzzy clustering algorithms. Computing and Informatics, 30(4), 701–720.

    MATH  Google Scholar 

  • Chi, H. F., Gao, S. X., Soli, S. D., & Alwan, A. (2003). Band-limited feedback cancellation with a modified filtered-X LMS algorithm for hearing aids. Speech Communication, 39(1), 147–161.

    Article  MATH  Google Scholar 

  • Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech Audio Processing, 11(5), 466–475.

    Article  Google Scholar 

  • Cohen, I. (2004). Speech enhancement using a non-causal a priori SNR estimator. IEEE Signal Processing Letters, 11(9), 725–728.

    Article  Google Scholar 

  • Cohen, I. (2005). Speech enhancement using super Gaussian speech models and non causal a priori SNR estimation. Speech Communication, 47(3), 336–350.

    Article  MathSciNet  Google Scholar 

  • Cohen, I., & Berdugo, B. (2002). Noise Estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.

    Article  Google Scholar 

  • Cornelis, B., Moonen, M., & Wouters, J. (2011). Performance analysis of multichannel Wiener Filter-based noise reduction in hearing aids under second order statistics estimation errors. IEEE Transactions on Audio, Speech and Language Processing, 19(5), 1368–1381.

    Article  Google Scholar 

  • Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4), 357–366.

    Article  Google Scholar 

  • Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(6), 1109–1121.

    Article  Google Scholar 

  • Erkelens, J., Jensen, J., & Heusdens, R. (2007). A data driven approach to optimized spectral speech enhancement methods for various error criteria. Speech Communication, 49(7), 530–541.

    Article  Google Scholar 

  • Erkelens, J. S., & Heusdens, R. (2008). Tracking of non-stationary noise based on data-driven recursive noise power estimation. IEEE Transactions on Audio, Speech, and Language Processing, 16(6), 1112–1123.

    Article  Google Scholar 

  • Gerkmann, T., & Hendriks, R. C. (2011). Noise power estimation based on the probability of speech presence. In Proceedings of the IEEE workshop on applications of signal processing to audio and acoustics, pp. 145–148.

  • Ghanbari, Y., Karami, M., & Amelifard, B. (2004). Improved Multiband Spectral subtraction method for speech enhancement. In Proceedings of the sixth IASTED international conference on signal and image processing, pp. 225–230.

  • Haykin, S., & Widrow, B. (2003). Least-mean-square adaptive filters. New York: Wiley.

    Book  Google Scholar 

  • Hellgren, J. (2002). Analysis of feedback cancellation in hearing aids with filtered-X LMS and the direct method of closed loop identification. IEEE Transactions on Speech and Audio Processing, 10(2), 119–131.

    Article  Google Scholar 

  • Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis for speech. Journal of Acoustic Society of America, 87(4), 1738–1752.

    Article  Google Scholar 

  • Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.

    Article  Google Scholar 

  • Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In Proceedings of the IEEE fourth international conference on signal processing and communication systems, pp. 1–5.

  • Huang, H. C., & Lee, J. (2012). A new variable step-size NLMS algorithm and its performance analysis. IEEE Transactions on Signal Processing, 60(4), 2055–2060.

    Article  MathSciNet  MATH  Google Scholar 

  • Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014a). Speech enhancement using modified magnitude estimation- based spectral subtraction algorithm. Arabian Journal for Sciences and Engineering, 39(32), 8965–8978.

    Article  Google Scholar 

  • Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014b). Adaptive noise reduction, algorithm for speech enhancement. World Academy of Science, Engineering and Technology - International Journal of Computer, Information, Systems and Control Engineering, 8(6), 987–994.

    Google Scholar 

  • Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014c). Hybrid modeling algorithm for Continuous Tamil Speech Recognition. World Academy of Science, Engineering and Technology - International Journal of Computer, Information, Systems and Control Engineering, 8(12), 1927–1934.

    Google Scholar 

  • Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2015). Noise tracking algorithm for speech enhancement. Applied Mathematics and Information Sciences, 9(2), 691–698.

    Google Scholar 

  • Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, pp. 4164–4167.

  • Kesarkar, M. P. (2003). Feature extraction for speech recognition. Technical Credit Seminar Report, Electronic Systems Group, IIT Bombay.

  • Li, X. G., Yao, M. F., & Huang, W. T. (2011). Speech recognition based on k-means clustering and neural network ensembles. In Proceedings of the IEEE seventh international conference on natural computation, Vol. 2, pp. 614–617.

  • Loizou, P. C. (2005). Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum. IEEE Transactions on Speech and Audio Processing, 13(5), 857–869.

    Article  Google Scholar 

  • Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech Audio Processing, 9(5), 504–512.

    Article  Google Scholar 

  • Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and super-Gaussian priors. IEEE Transactions on Speech and Audio Processing, 3(5), 845–856.

    Article  Google Scholar 

  • Mohammed, J. R., & Shafi, M. S. (2012). An efficient adaptive noise cancellation scheme using ALE and NLMS filters. International Journal of Electrical and Computer Engineering, 2(3), 325–332.

    Google Scholar 

  • Paliwal, K., Schwerin, B., & Wójcicki, K. (2012). Speech enhancement using a minimum mean square error short time spectral modulation magnitude estimator. Speech Communication, 54(2), 282–305.

    Article  Google Scholar 

  • Paliwal, K., Wójcicki, K., & Schwerin, B. (2010). Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Communication, 52(5), 450–475.

    Article  Google Scholar 

  • Porter, J., & Boll, S. (1984). Optimal estimators for spectral restoration of noisy speech. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 9, pp. 53–56.

  • Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

    Article  Google Scholar 

  • Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall.

    Google Scholar 

  • Rabiner, L. R. & Sambur, M. R. (1975). An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal, 54(2), 297–315.

    Article  Google Scholar 

  • Rahman, M. M., & Bhuiyan, M. A. A. (2012). Continuous Bangla speech segmentation using short-term speech features extraction approaches. International Journal of Advanced Computer Science and Applications, 3(11), 131–138.

    Google Scholar 

  • Rahman, M. Z. U., Shaik, R. A., & Reddy, D. V. (2009). Adaptive noise removal in the ECG using the block LMS algorithm. In Proceedings of the second IEEE international conference on adaptive science and technology, pp. 380–383.

  • Rangachari, S. (2004). Noise estimation algorithms for highly non-stationary environments. MS thesis, University of Texas, Dallas.

  • Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.

    Article  Google Scholar 

  • Rangachari, S., Loizou, P. C., & Hu, Y. (2004). A noise estimation algorithm with rapid adaptation for highly non-stationary environments. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, I-305–308.

    Google Scholar 

  • Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.

    Article  Google Scholar 

  • Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 2, pp. 629–632.

  • Sunny, S., David, P. S., & Jacob, K. P. (2012). Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in Malayalam. In Proceedings of the IEEE international conference on advances in computing and communications, pp. 27–30.

  • Thangarajan, R., Natarajan, A. M., & Selvam, M. (2009). Syllable modeling in continuous speech recognition for Tamil language. International Journal of Speech Technology, 12(1), 47–57.

    Article  Google Scholar 

  • Vyas, M. (2013). A Gaussian mixture model based speech recognition system using MATLAB. Signal & Image Processing: An International Journal (SIPIJ), 4, 109–118.

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for all their valuable comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Kalamani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kalamani, M., Krishnamoorthi, M. & Valarmathi, R. Continuous Tamil Speech Recognition technique under non stationary noisy environments. Int J Speech Technol 22, 47–58 (2019). https://doi.org/10.1007/s10772-018-09580-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-09580-8

Keywords

Navigation