Skip to main content

A Study on the Benefits of Phase-Aware Speech Enhancement in Challenging Noise Scenarios

  • Conference paper
  • First Online:
Latent Variable Analysis and Signal Separation (LVA/ICA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10891))

  • 1766 Accesses

Abstract

In recent years, there has been a renaissance of research on the role of the spectral phase in single-channel speech enhancement. One of the recent proposals is to not only estimate the clean speech phase but also use this phase estimate as an additional source of information to facilitate the estimation of the clean speech magnitude. To assess the potential benefit of such approaches, in this paper we systematically explore in which situations additional information about the clean speech phase is most valuable. For this, we compare the performance of phase-aware and phase-blind clean speech estimators in different noise scenarios, i.e. at different signal to noise ratios (SNRs) and for noise sources with different degrees of stationarity. Interestingly, the results indicate that the greatest benefits can be achieved in situations where conventional magnitude-only speech enhancement is most challenging, namely in highly non-stationary noises at low SNRs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Breithaupt, C., Gerkmann, T., Martin, R.: A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, pp. 4897–4900 (2008)

    Google Scholar 

  2. Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)

    Article  Google Scholar 

  3. Ephraim, Y., Malah, D.: Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)

    Article  Google Scholar 

  4. Erkelens, J.S., Hendriks, R.C., Heusdens, R., Jensen, J.: Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Trans. Audio Speech Lang. Process. 15(6), 1741–1752 (2007)

    Article  Google Scholar 

  5. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: DARPA TIMIT acoustic phonetic continuous speech corpus CDROM (1993)

    Google Scholar 

  6. Gerkmann, T.: Bayesian estimation of clean speech spectral coefficients given a priori knowledge of the phase. IEEE Trans. Signal Process. 62(16), 4199–4208 (2014)

    Article  MathSciNet  Google Scholar 

  7. Gerkmann, T., Hendriks, R.C.: Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. IEEE Trans. Audio Speech Lang. Process. 20(4), 1383–1393 (2012)

    Article  Google Scholar 

  8. Gerkmann, T., Krawczyk, M.: MMSE-optimal spectral amplitude estimation given the STFT-phase. IEEE Signal Process. Lett. 20(2), 129–132 (2013)

    Article  Google Scholar 

  9. Gerkmann, T., Krawczyk, M., Rehr, R.: Phase estimation in speech enhancement – unimportant, important, or impossible? In: IEEE Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel (2012)

    Google Scholar 

  10. Gerkmann, T., Krawczyk-Becker, M., Le Roux, J.: Phase processing for single channel speech enhancement: history and recent advances. IEEE Signal Process. Mag. 32(2), 55–66 (2015)

    Article  Google Scholar 

  11. Gonzalez, S., Brookes, M.: PEFAC - a pitch estimation algorithm robust to high levels of noise. IEEE Trans. Audio Speech Lang. Process. 22(2), 518–530 (2014)

    Article  Google Scholar 

  12. Griffin, D.W., Lim, J.S.: Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)

    Article  Google Scholar 

  13. Hendriks, R.C., Gerkmann, T., Jensen, J.: DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement: A Survey of the State-of-the-Art. Morgan & Claypool, Colorado (2013)

    Google Scholar 

  14. Hendriks, R.C., Jensen, J., Heusdens, R.: Noise tracking using DFT domain subspace decompositions. IEEE Trans. Audio Speech Lang. Process. 16(3), 541–553 (2008)

    Article  Google Scholar 

  15. ITU-T: Perceptual evaluation of speech quality (PESQ). ITU-T Recommendation P.862 (2001)

    Google Scholar 

  16. Krawczyk, M., Gerkmann, T.: STFT phase reconstruction in voiced speech for an improved single-channel speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1931–1940 (2014)

    Article  Google Scholar 

  17. Krawczyk-Becker, M., Gerkmann, T.: An evaluation of the perceptual quality of phase-aware single-channel speech enhancement. J. Acoust. Soc. Am. 140(4), EL364–EL369 (2016)

    Article  Google Scholar 

  18. Krawczyk-Becker, M., Gerkmann, T.: On MMSE-based estimation of spectral speech coefficients under phase-uncertainty. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2251–2262 (2016)

    Article  Google Scholar 

  19. Le Roux, J., Vincent, E.: Consistent Wiener filtering for audio source separation. IEEE Signal Process. Lett. 20(3), 217–220 (2013)

    Article  Google Scholar 

  20. Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9(5), 504–512 (2001)

    Article  Google Scholar 

  21. Martin, R.: Speech enhancement based on minimum mean-square error estimation and supergaussian priors. IEEE Trans. Speech Audio Process. 13(5), 845–856 (2005)

    Article  Google Scholar 

  22. Mowlaee, P., Kulmer, J.: Harmonic phase estimation in single-channel speech enhancement using phase decomposition and SNR information. IEEE/ACM Trans. Audio Speech Lang. Process. 23(9), 1521–1532 (2015)

    Article  Google Scholar 

  23. Mowlaee, P., Saeidi, R.: Iterative closed-loop phase-aware single-channel speech enhancement. IEEE Signal Process. Lett. 20(12), 1235–1239 (2013)

    Article  Google Scholar 

  24. Paliwal, K., Wójcicki, K., Shannon, B.: The importance of phase in speech enhancement. ELSEVIER Speech Commun. 53(4), 465–494 (2011)

    Article  Google Scholar 

  25. Sturmel, N., Daudet, L.: Signal reconstruction from STFT magnitude: a state of the art. In: International Conference on Digital Audio Effects (DAFx), Paris, France, pp. 375–386 (2011)

    Google Scholar 

  26. Taal, C.H., Hendriks, R.C., Heusdens, R., Jensen, J.: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)

    Article  Google Scholar 

  27. Wang, D.L., Lim, J.S.: The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process. 30(4), 679–681 (1982)

    Article  Google Scholar 

  28. You, C.H., Koh, S.N., Rahardja, S.: \(\beta \)-order MMSE spectral amplitude estimation for speech enhancement. IEEE Trans. Speech Audio Process. 13(4), 475–486 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Krawczyk-Becker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Krawczyk-Becker, M., Gerkmann, T. (2018). A Study on the Benefits of Phase-Aware Speech Enhancement in Challenging Noise Scenarios. In: Deville, Y., Gannot, S., Mason, R., Plumbley, M., Ward, D. (eds) Latent Variable Analysis and Signal Separation. LVA/ICA 2018. Lecture Notes in Computer Science(), vol 10891. Springer, Cham. https://doi.org/10.1007/978-3-319-93764-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93764-9_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93763-2

  • Online ISBN: 978-3-319-93764-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics