Skip to main content

Musical-Noise-Free Blind Speech Extraction Based on Higher-Order Statistics Analysis

  • Chapter
  • First Online:
Audio Source Separation

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

In this chapter, we introduce a musical-noise-free blind speech extraction method using a microphone array for application to nonstationary noise. In the recent noise reduction study, it was found that optimized iterative spectral subtraction (SS) results in speech enhancement with almost no musical noise generation, but this method is valid only for stationary noise. The method presented in this chapter consists of iterative blind dynamic noise estimation by, e.g., independent component analysis (ICA) or multichannel Wiener filtering, and musical-noise-free speech extraction by modified iterative SS, where multiple iterative SS is applied to each channel while maintaining the multichannel property reused for the dynamic noise estimators. Also, in relation to the method, we discuss the justification of applying ICA to signals nonlinearly distorted by SS. From objective and subjective evaluations simulating a real-world hands-free speech communication system, we reveal that the method outperforms the conventional speech enhancement methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)

    Article  Google Scholar 

  2. M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proceeding of ICASSP (1979), pp. 208–211

    Google Scholar 

  3. R. McAulay, M. Malpass, Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. 28(2), 137–145 (1980)

    Article  Google Scholar 

  4. R. Martin, Spectral subtraction based on minimum statistics, in Proceeding of EUSIPCO (1994), pp. 1182–1185

    Google Scholar 

  5. P.C. Loizou, Speech Enhancement Theory and Practice (CRC Press, Taylor & Francis Group FL, 2007)

    Google Scholar 

  6. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)

    Article  Google Scholar 

  7. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)

    Article  Google Scholar 

  8. T. Lotter, P. Vary, Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP J. Appl. Signal Process. 2005, 1110–1126 (2005)

    MATH  Google Scholar 

  9. O. Cappe, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 2(2), 345–349 (1994)

    Article  Google Scholar 

  10. Z. Goh, K.-C. Tan, B. Tan, Postprocessing method for suppressing musical noise generated by spectral subtraction. IEEE Trans. Speech Audio Process. 6(3), 287–292 (1998)

    Article  Google Scholar 

  11. Y. Uemura, Y. Takahashi, H. Saruwatari, K. Shikano, K. Kondo, Automatic optimization scheme of spectral subtraction based on musical noise assessment via higher-order statistics, in Proceeding of IWAENC (2008)

    Google Scholar 

  12. Y. Uemura, Y. Takahashi, H. Saruwatari, K. Shikano, K. Kondo, Musical noise generation analysis for noise reduction methods based on spectral subtraction and MMSE STSA estimation, in Proceeding of ICASSP (2009), pp. 4433–4436

    Google Scholar 

  13. Y. Takahashi, R. Miyazaki, H. Saruwatari, K. Kondo, Theoretical analysis of musical noise in nonlinear noise reduction based on higher-order statistics, in Proceeding of APSIPA Annual Summit and Conference (2012)

    Google Scholar 

  14. K. Yamashita, S. Ogata, T. Shimamura, Spectral subtraction iterated with weighting factors, in Proceeding of IEEE Speech Coding Workshop (2002), pp. 138–140

    Google Scholar 

  15. M.R. Khan, T. Hansen, Iterative noise power subtraction technique for improved speech quality, in Proceeding of ICECE (2008), pp. 391–394

    Google Scholar 

  16. S. Li, J.-Q. Wang, M. Niu, X.-J. Jing, T. Liu, Iterative spectral subtraction method for millimeter-wave conducted speech enhancement. J. Biomed. Sci. Eng. 2010(3), 187–192 (2010)

    Article  Google Scholar 

  17. T. Inoue, H. Saruwatari, Y. Takahashi, K. Shikano, K. Kondo, Theoretical analysis of iterative weak spectral subtraction via higher-order statistics, in Proceeding of IEEE International Workshop on Machine Learning for Signal Processing (2010), pp. 220–225

    Google Scholar 

  18. R. Miyazaki, H. Saruwatari, T. Inoue, Y. Takahashi, K. Shikano, K. Kondo, Musical-noise-free speech enhancement based on optimized iterative spectral subtraction. IEEE Trans. Audio Speech Lang. Process. 20(7), 2080–2094 (2012)

    Article  Google Scholar 

  19. R. Miyazaki, H. Saruwatari, S. Nakamura, K. Shikano, K. Kondo, J. Blanchette, M. Bouchard, Musical-noise-free blind speech extraction integrating microphone array and iterative spectral subtraction. Signal Process. (Elsevier) 102, 226–239 (2014)

    Article  Google Scholar 

  20. P. Comon, Independent component analysis, a new concept? Signal Process. (Elsevier) 36, 287–314 (1994)

    Article  MATH  Google Scholar 

  21. S. Araki, R. Mukai, S. Makino, T. Nishikawa, H. Saruwatari, The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech Audio Process. 11(2), 109–116 (2003)

    Article  MATH  Google Scholar 

  22. H. Sawada, R. Mukai, S. Araki, S. Makino, A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 12(5), 530–538 (2004)

    Article  Google Scholar 

  23. H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, K. Shikano, Blind source separation based on a fast-convergence algorithm combining ICA and beamforming. IEEE Trans. Audio Speech Lang. Process. 14(2), 666–678 (2006)

    Article  Google Scholar 

  24. A. Homayoun, M. Bouchard, Improved noise power spectrum density estimation for binaural hearing aids operating in a diffuse noise field environment. IEEE Trans. Audio Speech Lang. Process. 17(4), 521–533 (2009)

    Article  Google Scholar 

  25. T. Inoue, H. Saruwatari, Y. Takahashi, K. Shikano, K. Kondo, Theoretical analysis of musical noise in generalized spectral subtraction based on higher order statistics. IEEE Trans. Audio Speech Lang. Process. 19(6), 1770–1779 (2011)

    Article  Google Scholar 

  26. H. Yu, T. Fingscheidt, A figure of merit for instrumental optimization of noise reduction algorithms, in Proceeding of DSP in Vehicles (2011)

    Google Scholar 

  27. H. Yu, T. Fingscheidt, Black box measurement of musical tones produced by noise reduction systems, in Proceeding of ICASSP (2012), pp. 4573–4576

    Google Scholar 

  28. S. Kanehara, H. Saruwatari, R. Miyazaki, K. Shikano, K. Kondo, Theoretical analysis of musical noise generation in noise reduction methods with decision-directed a priori SNR estimator, in Proceeding of IWAENC (2012)

    Google Scholar 

  29. S. Kanehara, H. Saruwatari, R. Miyazaki, K. Shikano, K. Kondo, Comparative study on various noise reduction methods with decision-directed a priori SNR estimator via higher-order statistics, in Proceeding of APSIPA Annual Summit and Conference (2012)

    Google Scholar 

  30. R. Miyazaki, H. Saruwatari, K. Shikano, K. Kondo, Musical-noise-free speech enhancement based on iterative Wiener filtering, in Proceeding of IEEE International Symposium on Signal Processing and Information Technology (2012)

    Google Scholar 

  31. S. Nakai, H. Saruwatari, R. Miyazaki, S. Nakamura, K. Kondo, Theoretical analysis of biased MMSE short-time spectral amplitude estimator and its extension to musical-noise-free speech enhancement, in Proceeding of Hands-Free Speech Communication and Microphone Arrays (2014)

    Google Scholar 

  32. H. Saruwatari, Statistical-model-based speech enhancement with musical-noise-free properties, in Proceeding of IEEE International Conference on Digital Signal Processing (2015), pp. 1201–1205

    Google Scholar 

  33. A. Hiroe, Solution of permutation problem in frequency domain ICA using multivariate probability density functions, in Proceeding of ICA (2006), pp. 601–608

    Google Scholar 

  34. T. Kim, H.T. Attias, S.-Y. Lee, T.-W. Lee, Blind source separation exploiting higher-order frequency dependencies. IEEE Trans. Audio Speech Lang. Process. 15(1), 70–79 (2007)

    Article  Google Scholar 

  35. N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, in Proceeding of WASPAA (2011), pp. 189–192

    Google Scholar 

  36. D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Relaxation of rank-1 spatial constraint in overdetermined blind source separation, in Proceeding of EUSIPCO (2015), pp. 1271–1275

    Google Scholar 

  37. D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1626–1641 (2016)

    Article  Google Scholar 

  38. Y. Mitsui, D. Kitamura, S. Takamichi, N. Ono, H. Saruwatari, Blind source separation based on independent low-rank matrix analysis with sparse regularization for time-series activity, in Proceeding of ICASSP (2017), pp. 21–25

    Google Scholar 

  39. S. Mogami, D. Kitamura, Y. Mitsui, N. Takamune, H. Saruwatari, N. Ono, Independent low-rank matrix analysis based on complex Student’s \(t\)-distribution for blind audio source separation, in Proceeding of IEEE International Workshop on Machine Learning for Signal Processing (2017)

    Google Scholar 

  40. F.D. Aprilyanti, J. Even, H. Saruwatari, K. Shikano, S. Nakamura, T. Takatani, Suppression of noise and late reverberation based on blind signal extraction and Wiener filtering. Acoust. Sci. Technol. 36(4), 302–313 (2015)

    Article  Google Scholar 

  41. H. Saruwatari, S. Kurita, K. Takeda, F. Itakura, T. Nishikawa, Blind source separation combining independent component analysis and beamforming. EURASIP J. Appl. Signal Process. 2003, 1135–1146 (2003)

    MATH  Google Scholar 

  42. Y. Takahashi, T. Takatani, K. Osako, H. Saruwatari, K. Shikano, Blind spatial subtraction array for speech enhancement in noisy environment. IEEE Trans. Audio Speech Lang. Process. 17(4), 650–664 (2009)

    Article  Google Scholar 

  43. Y. Takahashi, H. Saruwatari, K. Shikano, K. Kondo, Musical-noise analysis in methods of integrating microphone array and spectral subtraction based on higher-order statistics. EURASIP J. Adv. Signal Process. 2010(431347), 25 pages (2010)

    Google Scholar 

  44. H. Saruwatari, Y. Ishikawa, Y. Takahashi, T. Inoue, K. Shikano, K. Kondo, Musical noise controllable algorithm of channelwise spectral subtraction and adaptive beamforming based on higher-order statistics. IEEE Trans. Audio Speech Lang. Process. 19(6), 1457–1466 (2011)

    Article  Google Scholar 

  45. R. Miyazaki, H. Saruwatari, K. Shikano, Theoretical analysis of amounts of musical noise and speech distortion in structure-generalized parametric spatial subtraction array. IEICE Trans. Fundam. 95-A(2), 586–590 (2012)

    Google Scholar 

  46. S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, F. Itakura, Evaluation of blind signal separation method using directivity pattern under reverberant conditions, in Proceeding of ICASSP, vol. 5 (2000), pp. 3140–3143

    Google Scholar 

  47. J. Even, H. Saruwatari, K. Shikano, T. Takatani, Speech enhancement in presence of diffuse background noise: Why using blind signal extraction? in Proceeding of ICASSP (2010), pp. 4770–4773

    Google Scholar 

  48. J. Even, C. Ishi, H. Saruwatari, N. Hagita, Close speaker cancellation for suppression of non-stationary background noise for hands-free speech interface, in Proceeding of INTERSPEECH (2010), pp. 977–980

    Google Scholar 

  49. R. Prasad, H. Saruwatari, K. Shikano, Probability distribution of time-series of speech spectral components, IEICE Trans. Fundam. E87-A(3), 584–597 (2004)

    Google Scholar 

  50. R. Prasad, H. Saruwatari, K. Shikano, Estimation of shape parameter of GGD function by negentropy matching. Neural Process. Lett. 22, 377–389 (2005)

    Article  Google Scholar 

  51. T.H. Dat, K. Takeda, F. Itakura, Generalized gamma modeling of speech and its online estimation for speech enhancement, in Proceeding of ICASSP, vol. 4 (2005), pp. 181–184

    Google Scholar 

  52. I. Andrianakis, P.R. White, MMSE speech spectral amplitude estimators with chi and gamma speech priors, in Proceeding of ICASSP (2006), pp. III-1068–III-1071

    Google Scholar 

  53. R. Wakisaka, H. Saruwatari, K. Shikano, T. Takatani, Speech prior estimation for generalized minimum mean-square error short-time spectral amplitude estimator. IEICE Trans. Fundam. 95-A(2), 591–595 (2012)

    Google Scholar 

  54. R. Wakisaka, H. Saruwatari, K. Shikano, T. Takatani, Speech kurtosis estimation from observed noisy signal based on generalized Gaussian distribution prior and additivity of cumulants, in Proceeding of ICASSP (2012), pp. 4049–4052

    Google Scholar 

  55. I. Cohen, Optimal speech enhancement under signal presence uncertainty using log-spectra amplitude estimator. IEEE Signal Process. Lett. 9(4), 113–116 (2002)

    Article  Google Scholar 

  56. H. Buchner, R. Aichner, W. Kellermann, A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics. IEEE Trans. Speech Audio Process. 13(1), 120–134 (2005)

    Article  Google Scholar 

  57. Y. Mori, H. Saruwatari, T. Takatani, S. Ukai, K. Shikano, T. Hiekata, Y. Ikeda, H. Hashimoto, T. Morita, Blind separation of acoustic signals combining SIMO-model-based independent component analysis and binary masking. EURASIP J. Appl. Signal Process. 2006(34970), 17 pages (2006)

    Google Scholar 

  58. T. Hiekata, Y. Ikeda, T. Yamashita, T. Morita, R. Zhang, Y. Mori, H. Saruwatari, K. Shikano, Development and evaluation of pocket-size real-time blind source separation microphone. Acoust. Sci. Technol. 30(4), 297–304 (2009)

    Article  Google Scholar 

  59. Y. Omura, H. Kamado, H. Saruwatari, K. Shikano, Real-time semi-blind speech extraction with speaker direction tracking on Kinect, in Proceeding of APSIPA Annual Summit and Conference (2012)

    Google Scholar 

  60. Y. Bando, H. Saruwatari, N. Ono, S. Makino, K. Itoyama, D. Kitamura, M. Ishimura, M. Takakusaki, N. Mae, K. Yamaoka, Y. Matsui, Y.i Ambe, M. Konyo, S. Tadokoro, K. Yoshii, H.G. Okuno, Low-latency and high-quality two-stage human-voice-enhancement system for a hose-shaped rescue robot. J. Robot. Mechatron. 29(1), 198–212 (2017)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by SECOM Science and Technology Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroshi Saruwatari .

Editor information

Editors and Affiliations

Appendix

Appendix

This appendix provides a brief review of the time-variant nonlinear noise estimator. For more detailed information, Ref. [24] is available.

Let \(X_{1}(f,\tau )\) and \(X_{2}(f,\tau )\) be noisy signals received at the microphones in the time-frequency domain, defined as

$$\begin{aligned} X_{1}(f,\tau ) = H_{1}(f)S(f,\tau )+N_{1}(f,\tau ),\end{aligned}$$
(13.55)
$$\begin{aligned} X_{2}(f,\tau ) = H_{2}(f)S(f,\tau )+N_{2}(f,\tau ), \end{aligned}$$
(13.56)

where \(H_{1}(f)\) and \(H_{2}(f)\) are the transfer functions from the target signal position to each microphone. Next, the auto-power PSDs in each microphone, \(\varGamma _{11}(f)\) and \(\varGamma _{22}(f)\), can be expressed as follows:

$$\begin{aligned} \varGamma _{11}(f,\tau ) = |H_{1}(f)|^2 \varGamma _\mathrm{SS}(f,\tau )+\varGamma _\mathrm{NN}(f,\tau ) ,\end{aligned}$$
(13.57)
$$\begin{aligned} \varGamma _{22}(f,\tau ) = |H_{2}(f)|^2 \varGamma _\mathrm{SS}(f,\tau )+\varGamma _\mathrm{NN}(f,\tau ) , \end{aligned}$$
(13.58)

where \(\varGamma _\mathrm{SS}(f,\tau )\) is the PSD of the target speech signal and \(\varGamma _\mathrm{NN}(f,\tau )\) is the PSD of the noise signal. In this chapter, we assume that the left and right noise PSDs are approximately the same, i.e., \(\varGamma _\mathrm{N_1 N_1}(f,\tau ) \simeq \varGamma _\mathrm{N_2 N_2}(f,\tau ) \simeq \varGamma _\mathrm{NN}(f,\tau )\).

Next, we consider the Wiener solution between the left and right transfer functions, which is defined as

$$\begin{aligned} H_\mathrm{W}(f,\tau )=\frac{\varGamma _\mathrm{12}(f,\tau )}{\varGamma _\mathrm{22}(f,\tau )}, \end{aligned}$$
(13.59)

where \(\varGamma _\mathrm{12}(f)\) is the cross-PSD between the left and right noisy signals. The cross-PSD expression then becomes

$$\begin{aligned} \varGamma _\mathrm{12}(f,\tau )= \varGamma _\mathrm{SS}(f,\tau )H_\mathrm{1}(f)H^*_\mathrm{2}(f). \end{aligned}$$
(13.60)

Therefore, substituting (13.60) into (13.59) yields

$$\begin{aligned} H_\mathrm{W}(f,\tau )=\frac{\varGamma _\mathrm{SS}(f,\tau )H_\mathrm{1}(f)H^*_\mathrm{2}(f)}{\varGamma _\mathrm{22}(f,\tau )}. \end{aligned}$$
(13.61)

Furthermore, using (13.57) and (13.58), the squared magnitude response of the Wiener solution in (13.61) can also be expressed as

$$\begin{aligned} |H_\mathrm{W}(f,\tau )|^2\!=\!\frac{(\varGamma _\mathrm{11}(f,\tau )\!-\!\varGamma _\mathrm{NN}(f,\tau ))(\varGamma _\mathrm{22}(f,\tau )\!-\!\varGamma _\mathrm{NN}(f,\tau ))}{\varGamma ^2_\mathrm{22}(f,\tau )}. \end{aligned}$$
(13.62)

Equation (13.62) is rearranged into the following quadratic equation:

$$\begin{aligned}&\varGamma ^2_\mathrm{NN}(f,\tau ) - \varGamma _\mathrm{NN}(f,\tau )\left( \varGamma _\mathrm{11}(f,\tau )+\varGamma _\mathrm{22}(f,\tau )\right) \nonumber \\&+\varGamma _\mathrm{EE}(f,\tau )\varGamma _\mathrm{22}(f,\tau )=0, \end{aligned}$$
(13.63)

where

$$\begin{aligned} \varGamma _\mathrm{EE}(f,\tau )=\varGamma _\mathrm{11}(f,\tau ) - \varGamma _\mathrm{22}(f,\tau )|H_\mathrm{W}(f)|^2. \end{aligned}$$
(13.64)

Consequently, the noise PSD \(\varGamma _\mathrm{NN} (f)\) can be estimated by solving the quadratic equation in (13.63) as follows:

$$\begin{aligned} \varGamma _\mathrm{NN}(f,\tau )&=\frac{1}{2}\left( \varGamma _\mathrm{11}(f,\tau )+\varGamma _\mathrm{22}(f,\tau )\right) - \varGamma _\mathrm{avg}(f,\tau ), \end{aligned}$$
(13.65)
$$\begin{aligned} \varGamma _\mathrm{avg}(f,\tau )&= \frac{1}{2}\{(\varGamma _\mathrm{11}(f,\tau )+\varGamma _\mathrm{22}(f,\tau ))^2\nonumber \\&-4\varGamma _\mathrm{EE}(f,\tau )\varGamma _\mathrm{22}(f,\tau )\}^{0.5}. \end{aligned}$$
(13.66)

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Saruwatari, H., Miyazaki, R. (2018). Musical-Noise-Free Blind Speech Extraction Based on Higher-Order Statistics Analysis. In: Makino, S. (eds) Audio Source Separation. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-73031-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73031-8_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73030-1

  • Online ISBN: 978-3-319-73031-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics