Robustness of LSTM Neural Networks for the Enhancement of Spectral Parameters in Noisy Speech Signals

Coto-Jiménez, Marvin

doi:10.1007/978-3-030-04497-8_19

Marvin Coto-Jiménez ORCID: orcid.org/0000-0002-6833-9938^15,16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11289))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

892 Accesses
4 Citations

Abstract

In this paper, we carry out a comparative performance analysis of Long Short-term Memory (LSTM) Neural Networks for the task of noise reduction. Recent work in this area has shown the advantages of this kind of network for the enhancement of noisy speech, particularly when the training process is performed for specific Signal-to-Noise (SNR) levels.

For application in real-life environments, it is important to test the robustness of the approach without the a priori knowledge of the SNR noise levels, as classical signal processing-based algorithms do. In our experiments, we conduct the training stage with single and multiple noise conditions and perform the comparison of the results with the specific SNR training presented previously in the literature.

For the first time, results give a measure on the independence of the training conditions for the task of noise suppression in speech signals, and shows remarkable robustness of the LSTM for different SNR levels.

Supported by the University of Costa Rica.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Penn, G.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: Acoustics, Speech and Signal Processing, pp. 4277–4280. IEEE (2012)
Google Scholar
Bagchi, D., Mandel, M.I., Wang, Z., He, Y., Plummer, A., Fosler-Lussier, E.: Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 496–503. IEEE (2015)
Google Scholar
Coto-Jiménez, M., Goddard-Close, J., Martínez-Licona, F.: Improving automatic speech recognition containing additive noise using deep denoising autoencoders of LSTM networks. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 354–361. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43958-7_42
Chapter Google Scholar
Deng, L., et al.: Recent advances in deep learning for speech research at Microsoft. In: ICASSP, vol. 26, p. 64 (2013)
Google Scholar
Du, J., Wang, Q., Gao, T., Xu, Y., Dai, L.R., Lee, C.H.: Robust speech recognition with speech enhanced deep neural networks. In: Association (2014)
Google Scholar
Erro, D., Sainz, I., Navas, E., Hernáez, I.: Improved HNM-based vocoder for statistical synthesizers. In: Association (2011)
Google Scholar
Fan, Y., Qian, Y., Xie, F.L., Soong, F.K.: TTS synthesis with bidirectional LSTM based recurrent neural networks. In: Association (2014)
Google Scholar
Feng, X., Zhang, Y., Glass, J.: Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1759–1763. IEEE (2014)
Google Scholar
Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(Aug), 115–143 (2002)
MathSciNet MATH Google Scholar
Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005). https://doi.org/10.1007/11550907_126
Chapter Google Scholar
Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE (2013)
Google Scholar
Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)
Article MathSciNet Google Scholar
Han, K., He, Y., Bagchi, D., Fosler-Lussier, E., Wang, D.: Deep neural network based spectral feature mapping for robust speech recognition. In: Association (2015)
Google Scholar
Hansen, J.H., Pellom, B.L.: An effective quality evaluation protocol for speech enhancement algorithms. In: Fifth International Conference on Spoken Language Processing (1998)
Google Scholar
Healy, E.W., Yoho, S.E., Wang, Y., Wang, D.: An algorithm to improve speech recognition in noise for hearing-impaired listeners. J. Acoust. Soc. Am. 134(4), 3029–3038 (2013)
Article Google Scholar
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sign. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, J., Kingsbury, B.: Audio-visual deep learning for noise robust speech recognition, pp. 7596–7599. IEEE (2013)
Google Scholar
Ishii, T., Komiyama, H., Shinozaki, T., Horiuchi, Y., Kuroiwa, S. (eds.): In: Interspeech, pp. 3512–3516 (2013)
Google Scholar
Kominek, J., Black, A.W.: The CMU Arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis (2004)
Google Scholar
Kumar, A., Florencio, D.: Speech enhancement in multiple-noise conditions using deep neural networks. arXiv preprint arXiv:1605.02427 (2016)
Maas, A.L., Le, Q.V., O’Neil, T.M., Vinyals, O., Nguyen, P., Ng, A.Y.: Recurrent neural networks for noise reduction in robust ASR. In: Association (2012)
Google Scholar
Narayanan, A., Wang, D.: Ideal ratio mask estimation using deep neural networks for robust speech recognition, pp. 7092–7096. IEEE (2013)
Google Scholar
Seltzer, M.L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition, pp. 7398–7402. IEEE (2013)
Google Scholar
Sertsi, P., Boonkla, S., Chunwijitra, V., Kurpukdee, N., Wutiwiwatchai, C.: Robust voice activity detection based on LSTM recurrent neural networks and modulation spectrum. In: 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 342–346. IEEE (2017)
Google Scholar
Vincent, E., Watanabe, S., Nugraha, A.A., Barker, J., Marxer, R.: An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Comput. Speech Lang. 46, 535–557 (2017)
Article Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)
MathSciNet MATH Google Scholar
Weninger, F., Geiger, J., Wöllmer, M., Schuller, B., Rigoll, G.: Feature enhancement by deep lstm networks for asr in reverberant multisource environments. Comput. Speech Lang. 28(4), 888–902 (2014)
Article Google Scholar
Weninger, F., Watanabe, S., Tachioka, Y., Schuller, B.: Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4623–4627. IEEE (2014)
Google Scholar
Xu, Y., Du, J., Dai, L.R., Lee, C.H.: An experimental study on speech enhancement based on deep neural networks. IEEE Sign. Process. Lett. 21(1), 65–68 (2014)
Article Google Scholar
Zen, H., Sak, H.: Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4470–4474. IEEE (2015)
Google Scholar

Download references

Acknowledgments

This work was supported by the Universidad de Costa Rica.

Author information

Authors and Affiliations

PRIS-Lab, Escuela de Ingeniería Eléctrica, San Pedro, Costa Rica
Marvin Coto-Jiménez
Universidad de Costa Rica, San José, Costa Rica
Marvin Coto-Jiménez

Authors

Marvin Coto-Jiménez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marvin Coto-Jiménez .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Mexico City, Mexico
Ildar Batyrshin
Universidad Panamericana, Mexico City, Mexico
María de Lourdes Martínez-Villaseñor
Faculty of Engineering, Universidad Panamericana, Mexico City, Mexico
Hiram Eredín Ponce Espinosa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coto-Jiménez, M. (2018). Robustness of LSTM Neural Networks for the Enhancement of Spectral Parameters in Noisy Speech Signals. In: Batyrshin, I., Martínez-Villaseñor, M., Ponce Espinosa, H. (eds) Advances in Computational Intelligence. MICAI 2018. Lecture Notes in Computer Science(), vol 11289. Springer, Cham. https://doi.org/10.1007/978-3-030-04497-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-04497-8_19
Published: 03 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04496-1
Online ISBN: 978-3-030-04497-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics