Abstract
We propose a two stage noise reduction system for reducing background noise using single-microphone recordings in very low signal-to-noise ratio (SNR) based on Wiener filtering and ideal binary masking. The proposed system contains two stages. In first stage, the Wiener filtering with improved a priori SNR is applied to noisy speech for background noise reduction. In second stage, the ideal binary mask is estimated at every time–frequency channel by using pre-processed first stage speech and comparing the time–frequency channels against a pre-selected threshold T to reduce the residual noise. The time–frequency channels satisfying the threshold are preserved whereas all other time–frequency channels are attenuated. The results revealed substantial improvements in speech intelligibility and quality over that accomplished with the traditional noise reduction algorithms and unprocessed speech.
Similar content being viewed by others
References
Abd El-Fattah, M. A., Dessouky, M. I., Abbas, A. M., Diab, S. M., El-Rabaie, S. M., & Al-Nuaimy, W., et al. (2014). Speech enhancement with an adaptive Wiener filter. International Journal of Speech Technology, 17(1), 53–64. doi:10.1007/s10772-013-9205-5.
Boldt, J. B., & Ellis, D. (2009). A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation. In Proc. EUSIPCO’09, Glasgow, August 2009 (pp. 1849–1853).
Boldt, J. B., Kjems, U., Pedersen, M. S., Lunner, T., & Wang, D. (2008). Estimation of the ideal binary mask using directional systems. In Proc. int. workshop acoust. echo and noise control (pp. 1–4)
Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. In IEEE transactions on acoustics, speech, and signal processing, ASSP (Vol. 27, pp. 113–120). doi:10.1109/TASSP.1979.1163209.
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121. doi:10.1109/TASSP.1984.1164453.
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. In IEEE transactions on acoustics, speech, signal processing, ASSP (Vol. 23, No. 2, pp. 443–445). doi:10.1109/TASSP.1985.1164550.
Hansen, J., & Pellom, B. (1998). An effective quality evaluation protocol for speech enhancement algorithms. In International Conference on Spoken Language Processing, 7(2819), 2822.
Hirsch, H., & Pearce, D. (2000). The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: ISCA ITRW ASR2000, Paris.
Hu, Y., & Loizou, P. (2007). Subjective evaluation and comparison of speech enhancement algorithms. Speech Communication, 49(7–8), 588–601. doi:10.1016/j.specom.2006.12.006.
ITU-T P.835. (2003). Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm.
ITU-T Recommendation P.56. (1993). Objective measurement of active speech level.
Klatt, D. (1982). Prediction of perceived phonetic distance from critical band spectra. In Proc. IEEE int. conf. acoust., speech, signal processing (Vol. 7, pp. 1278–1281). doi:10.1109/ICASSP.1982.1171512.
Kitawaki, N., Nagabuchi, H., & Itoh, K. (1988). Objective quality evaluation for low bit-rate speech coding systems. IEEE Journal on Selected Areas in Communications, 6(2), 262–273. doi:10.1109/49.601.
Lim, J, & Oppenheim, A. V. (1978). All-pole modeling of degraded speech. In IEEE trans. acoust., speech, signal proc., ASSP (Vol. 26, No. 3, pp. 197–210). doi:10.1109/TASSP.1978.1163086.
Loizou, P. C. (2007). Speech enhancement: Theory and practice. Boca Raton, FL: CRC Press.
Loizou, P. C. (2009). An algorithm that improves speech intelligibility in noise for normal-hearing listeners. The Journal of the Acoustical Society of America, 126(23), 1486–1494. doi:10.1121/1.3184603.
Quackenbush, S., Barnwell, T., & Clements, M. (1988). Objective measures of speech quality. Eaglewood Cliffs, NJ: Prentice-Hall.
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In Acoustics, speech, and signal processing ICASSP. doi:10.1109/ICASSP.2001.941023.
Saleem, N., Mustafa, E., Nawaz, A., & Khan, A. (2015a). Ideal binary masking for reducing convolutive noise. International Journal of Speech Technology, 18(4), 547–554. doi:10.1007/s10772-015-9298-0.
Saleem, N., Shafi, M., Mustafa, E., & Nawaz, A. (2015b). A novel binary mask estimation based on spectral subtraction gain-induced distortions for improved speech intelligibility and quality. Technical Journal, UET, Taxila, 20(4), 35–42.
Scalart, P., & Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. In Proc. IEEE int. conf. acoust., speech, signal processing (pp. 629–632). doi:10.1109/ICASSP.1996.543199.
Wang, D. (2005). On ideal binary mask as the computational goal of auditory scene analysis. In Speech separation by humans and machines (pp. 181–197). doi:10.1007/0-387-22794-6_12.
Wang, D. (2008). Time-frequency masking for speech separation and its potential for hearing aid design. Trends in Amplification, 12(4), 332–353. doi:10.1177/1084713808326455.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saleem, N. Single channel noise reduction system in low SNR. Int J Speech Technol 20, 89–98 (2017). https://doi.org/10.1007/s10772-016-9391-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-016-9391-z