Musical-Noise-Free Blind Speech Extraction Based on Higher-Order Statistics Analysis

Saruwatari, Hiroshi; Miyazaki, Ryoichi

doi:10.1007/978-3-319-73031-8_13

Hiroshi Saruwatari² &
Ryoichi Miyazaki³

Part of the book series: Signals and Communication Technology ((SCT))

1905 Accesses
1 Altmetric

Abstract

In this chapter, we introduce a musical-noise-free blind speech extraction method using a microphone array for application to nonstationary noise. In the recent noise reduction study, it was found that optimized iterative spectral subtraction (SS) results in speech enhancement with almost no musical noise generation, but this method is valid only for stationary noise. The method presented in this chapter consists of iterative blind dynamic noise estimation by, e.g., independent component analysis (ICA) or multichannel Wiener filtering, and musical-noise-free speech extraction by modified iterative SS, where multiple iterative SS is applied to each channel while maintaining the multichannel property reused for the dynamic noise estimators. Also, in relation to the method, we discuss the justification of applying ICA to signals nonlinearly distorted by SS. From objective and subjective evaluations simulating a real-world hands-free speech communication system, we reveal that the method outperforms the conventional speech enhancement methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27(2), 113–120 (1979)
Article Google Scholar
M. Berouti, R. Schwartz, J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proceeding of ICASSP (1979), pp. 208–211
Google Scholar
R. McAulay, M. Malpass, Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Process. 28(2), 137–145 (1980)
Article Google Scholar
R. Martin, Spectral subtraction based on minimum statistics, in Proceeding of EUSIPCO (1994), pp. 1182–1185
Google Scholar
P.C. Loizou, Speech Enhancement Theory and Practice (CRC Press, Taylor & Francis Group FL, 2007)
Google Scholar
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32(6), 1109–1121 (1984)
Article Google Scholar
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 33(2), 443–445 (1985)
Article Google Scholar
T. Lotter, P. Vary, Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP J. Appl. Signal Process. 2005, 1110–1126 (2005)
MATH Google Scholar
O. Cappe, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Trans. Speech Audio Process. 2(2), 345–349 (1994)
Article Google Scholar
Z. Goh, K.-C. Tan, B. Tan, Postprocessing method for suppressing musical noise generated by spectral subtraction. IEEE Trans. Speech Audio Process. 6(3), 287–292 (1998)
Article Google Scholar
Y. Uemura, Y. Takahashi, H. Saruwatari, K. Shikano, K. Kondo, Automatic optimization scheme of spectral subtraction based on musical noise assessment via higher-order statistics, in Proceeding of IWAENC (2008)
Google Scholar
Y. Uemura, Y. Takahashi, H. Saruwatari, K. Shikano, K. Kondo, Musical noise generation analysis for noise reduction methods based on spectral subtraction and MMSE STSA estimation, in Proceeding of ICASSP (2009), pp. 4433–4436
Google Scholar
Y. Takahashi, R. Miyazaki, H. Saruwatari, K. Kondo, Theoretical analysis of musical noise in nonlinear noise reduction based on higher-order statistics, in Proceeding of APSIPA Annual Summit and Conference (2012)
Google Scholar
K. Yamashita, S. Ogata, T. Shimamura, Spectral subtraction iterated with weighting factors, in Proceeding of IEEE Speech Coding Workshop (2002), pp. 138–140
Google Scholar
M.R. Khan, T. Hansen, Iterative noise power subtraction technique for improved speech quality, in Proceeding of ICECE (2008), pp. 391–394
Google Scholar
S. Li, J.-Q. Wang, M. Niu, X.-J. Jing, T. Liu, Iterative spectral subtraction method for millimeter-wave conducted speech enhancement. J. Biomed. Sci. Eng. 2010(3), 187–192 (2010)
Article Google Scholar
T. Inoue, H. Saruwatari, Y. Takahashi, K. Shikano, K. Kondo, Theoretical analysis of iterative weak spectral subtraction via higher-order statistics, in Proceeding of IEEE International Workshop on Machine Learning for Signal Processing (2010), pp. 220–225
Google Scholar
R. Miyazaki, H. Saruwatari, T. Inoue, Y. Takahashi, K. Shikano, K. Kondo, Musical-noise-free speech enhancement based on optimized iterative spectral subtraction. IEEE Trans. Audio Speech Lang. Process. 20(7), 2080–2094 (2012)
Article Google Scholar
R. Miyazaki, H. Saruwatari, S. Nakamura, K. Shikano, K. Kondo, J. Blanchette, M. Bouchard, Musical-noise-free blind speech extraction integrating microphone array and iterative spectral subtraction. Signal Process. (Elsevier) 102, 226–239 (2014)
Article Google Scholar
P. Comon, Independent component analysis, a new concept? Signal Process. (Elsevier) 36, 287–314 (1994)
Article MATH Google Scholar
S. Araki, R. Mukai, S. Makino, T. Nishikawa, H. Saruwatari, The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech Audio Process. 11(2), 109–116 (2003)
Article MATH Google Scholar
H. Sawada, R. Mukai, S. Araki, S. Makino, A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 12(5), 530–538 (2004)
Article Google Scholar
H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee, K. Shikano, Blind source separation based on a fast-convergence algorithm combining ICA and beamforming. IEEE Trans. Audio Speech Lang. Process. 14(2), 666–678 (2006)
Article Google Scholar
A. Homayoun, M. Bouchard, Improved noise power spectrum density estimation for binaural hearing aids operating in a diffuse noise field environment. IEEE Trans. Audio Speech Lang. Process. 17(4), 521–533 (2009)
Article Google Scholar
T. Inoue, H. Saruwatari, Y. Takahashi, K. Shikano, K. Kondo, Theoretical analysis of musical noise in generalized spectral subtraction based on higher order statistics. IEEE Trans. Audio Speech Lang. Process. 19(6), 1770–1779 (2011)
Article Google Scholar
H. Yu, T. Fingscheidt, A figure of merit for instrumental optimization of noise reduction algorithms, in Proceeding of DSP in Vehicles (2011)
Google Scholar
H. Yu, T. Fingscheidt, Black box measurement of musical tones produced by noise reduction systems, in Proceeding of ICASSP (2012), pp. 4573–4576
Google Scholar
S. Kanehara, H. Saruwatari, R. Miyazaki, K. Shikano, K. Kondo, Theoretical analysis of musical noise generation in noise reduction methods with decision-directed a priori SNR estimator, in Proceeding of IWAENC (2012)
Google Scholar
S. Kanehara, H. Saruwatari, R. Miyazaki, K. Shikano, K. Kondo, Comparative study on various noise reduction methods with decision-directed a priori SNR estimator via higher-order statistics, in Proceeding of APSIPA Annual Summit and Conference (2012)
Google Scholar
R. Miyazaki, H. Saruwatari, K. Shikano, K. Kondo, Musical-noise-free speech enhancement based on iterative Wiener filtering, in Proceeding of IEEE International Symposium on Signal Processing and Information Technology (2012)
Google Scholar
S. Nakai, H. Saruwatari, R. Miyazaki, S. Nakamura, K. Kondo, Theoretical analysis of biased MMSE short-time spectral amplitude estimator and its extension to musical-noise-free speech enhancement, in Proceeding of Hands-Free Speech Communication and Microphone Arrays (2014)
Google Scholar
H. Saruwatari, Statistical-model-based speech enhancement with musical-noise-free properties, in Proceeding of IEEE International Conference on Digital Signal Processing (2015), pp. 1201–1205
Google Scholar
A. Hiroe, Solution of permutation problem in frequency domain ICA using multivariate probability density functions, in Proceeding of ICA (2006), pp. 601–608
Google Scholar
T. Kim, H.T. Attias, S.-Y. Lee, T.-W. Lee, Blind source separation exploiting higher-order frequency dependencies. IEEE Trans. Audio Speech Lang. Process. 15(1), 70–79 (2007)
Article Google Scholar
N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, in Proceeding of WASPAA (2011), pp. 189–192
Google Scholar
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Relaxation of rank-1 spatial constraint in overdetermined blind source separation, in Proceeding of EUSIPCO (2015), pp. 1271–1275
Google Scholar
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans. Audio Speech Lang. Process. 24(9), 1626–1641 (2016)
Article Google Scholar
Y. Mitsui, D. Kitamura, S. Takamichi, N. Ono, H. Saruwatari, Blind source separation based on independent low-rank matrix analysis with sparse regularization for time-series activity, in Proceeding of ICASSP (2017), pp. 21–25
Google Scholar
S. Mogami, D. Kitamura, Y. Mitsui, N. Takamune, H. Saruwatari, N. Ono, Independent low-rank matrix analysis based on complex Student’s $t$-distribution for blind audio source separation, in Proceeding of IEEE International Workshop on Machine Learning for Signal Processing (2017)
Google Scholar
F.D. Aprilyanti, J. Even, H. Saruwatari, K. Shikano, S. Nakamura, T. Takatani, Suppression of noise and late reverberation based on blind signal extraction and Wiener filtering. Acoust. Sci. Technol. 36(4), 302–313 (2015)
Article Google Scholar
H. Saruwatari, S. Kurita, K. Takeda, F. Itakura, T. Nishikawa, Blind source separation combining independent component analysis and beamforming. EURASIP J. Appl. Signal Process. 2003, 1135–1146 (2003)
MATH Google Scholar
Y. Takahashi, T. Takatani, K. Osako, H. Saruwatari, K. Shikano, Blind spatial subtraction array for speech enhancement in noisy environment. IEEE Trans. Audio Speech Lang. Process. 17(4), 650–664 (2009)
Article Google Scholar
Y. Takahashi, H. Saruwatari, K. Shikano, K. Kondo, Musical-noise analysis in methods of integrating microphone array and spectral subtraction based on higher-order statistics. EURASIP J. Adv. Signal Process. 2010(431347), 25 pages (2010)
Google Scholar
H. Saruwatari, Y. Ishikawa, Y. Takahashi, T. Inoue, K. Shikano, K. Kondo, Musical noise controllable algorithm of channelwise spectral subtraction and adaptive beamforming based on higher-order statistics. IEEE Trans. Audio Speech Lang. Process. 19(6), 1457–1466 (2011)
Article Google Scholar
R. Miyazaki, H. Saruwatari, K. Shikano, Theoretical analysis of amounts of musical noise and speech distortion in structure-generalized parametric spatial subtraction array. IEICE Trans. Fundam. 95-A(2), 586–590 (2012)
Google Scholar
S. Kurita, H. Saruwatari, S. Kajita, K. Takeda, F. Itakura, Evaluation of blind signal separation method using directivity pattern under reverberant conditions, in Proceeding of ICASSP, vol. 5 (2000), pp. 3140–3143
Google Scholar
J. Even, H. Saruwatari, K. Shikano, T. Takatani, Speech enhancement in presence of diffuse background noise: Why using blind signal extraction? in Proceeding of ICASSP (2010), pp. 4770–4773
Google Scholar
J. Even, C. Ishi, H. Saruwatari, N. Hagita, Close speaker cancellation for suppression of non-stationary background noise for hands-free speech interface, in Proceeding of INTERSPEECH (2010), pp. 977–980
Google Scholar
R. Prasad, H. Saruwatari, K. Shikano, Probability distribution of time-series of speech spectral components, IEICE Trans. Fundam. E87-A(3), 584–597 (2004)
Google Scholar
R. Prasad, H. Saruwatari, K. Shikano, Estimation of shape parameter of GGD function by negentropy matching. Neural Process. Lett. 22, 377–389 (2005)
Article Google Scholar
T.H. Dat, K. Takeda, F. Itakura, Generalized gamma modeling of speech and its online estimation for speech enhancement, in Proceeding of ICASSP, vol. 4 (2005), pp. 181–184
Google Scholar
I. Andrianakis, P.R. White, MMSE speech spectral amplitude estimators with chi and gamma speech priors, in Proceeding of ICASSP (2006), pp. III-1068–III-1071
Google Scholar
R. Wakisaka, H. Saruwatari, K. Shikano, T. Takatani, Speech prior estimation for generalized minimum mean-square error short-time spectral amplitude estimator. IEICE Trans. Fundam. 95-A(2), 591–595 (2012)
Google Scholar
R. Wakisaka, H. Saruwatari, K. Shikano, T. Takatani, Speech kurtosis estimation from observed noisy signal based on generalized Gaussian distribution prior and additivity of cumulants, in Proceeding of ICASSP (2012), pp. 4049–4052
Google Scholar
I. Cohen, Optimal speech enhancement under signal presence uncertainty using log-spectra amplitude estimator. IEEE Signal Process. Lett. 9(4), 113–116 (2002)
Article Google Scholar
H. Buchner, R. Aichner, W. Kellermann, A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics. IEEE Trans. Speech Audio Process. 13(1), 120–134 (2005)
Article Google Scholar
Y. Mori, H. Saruwatari, T. Takatani, S. Ukai, K. Shikano, T. Hiekata, Y. Ikeda, H. Hashimoto, T. Morita, Blind separation of acoustic signals combining SIMO-model-based independent component analysis and binary masking. EURASIP J. Appl. Signal Process. 2006(34970), 17 pages (2006)
Google Scholar
T. Hiekata, Y. Ikeda, T. Yamashita, T. Morita, R. Zhang, Y. Mori, H. Saruwatari, K. Shikano, Development and evaluation of pocket-size real-time blind source separation microphone. Acoust. Sci. Technol. 30(4), 297–304 (2009)
Article Google Scholar
Y. Omura, H. Kamado, H. Saruwatari, K. Shikano, Real-time semi-blind speech extraction with speaker direction tracking on Kinect, in Proceeding of APSIPA Annual Summit and Conference (2012)
Google Scholar
Y. Bando, H. Saruwatari, N. Ono, S. Makino, K. Itoyama, D. Kitamura, M. Ishimura, M. Takakusaki, N. Mae, K. Yamaoka, Y. Matsui, Y.i Ambe, M. Konyo, S. Tadokoro, K. Yoshii, H.G. Okuno, Low-latency and high-quality two-stage human-voice-enhancement system for a hose-shaped rescue robot. J. Robot. Mechatron. 29(1), 198–212 (2017)
Google Scholar

Download references

Acknowledgements

This work was partially supported by SECOM Science and Technology Foundation.

Author information

Authors and Affiliations

The University of Tokyo, Tokyo, 113-8656, Japan
Hiroshi Saruwatari
Department of Computer Science and Electronic Engineering, National Institute of Technology, Tokuyama College, Gakuendai, Shunan, Yamaguchi, 745-8585, Japan
Ryoichi Miyazaki

Authors

Hiroshi Saruwatari
View author publications
You can also search for this author in PubMed Google Scholar
Ryoichi Miyazaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroshi Saruwatari .

Editor information

Editors and Affiliations

University of Tsukuba, Ibaraki, Japan
Shoji Makino

Appendix

This appendix provides a brief review of the time-variant nonlinear noise estimator. For more detailed information, Ref. [24] is available.

Let $X_{1}(f,\tau )$ and $X_{2}(f,\tau )$ be noisy signals received at the microphones in the time-frequency domain, defined as

$$\begin{aligned} X_{1}(f,\tau ) = H_{1}(f)S(f,\tau )+N_{1}(f,\tau ),\end{aligned}$$

(13.55)

$$\begin{aligned} X_{2}(f,\tau ) = H_{2}(f)S(f,\tau )+N_{2}(f,\tau ), \end{aligned}$$

(13.56)

where $H_{1}(f)$ and $H_{2}(f)$ are the transfer functions from the target signal position to each microphone. Next, the auto-power PSDs in each microphone, $\varGamma _{11}(f)$ and $\varGamma _{22}(f)$, can be expressed as follows:

$$\begin{aligned} \varGamma _{11}(f,\tau ) = |H_{1}(f)|^2 \varGamma _\mathrm{SS}(f,\tau )+\varGamma _\mathrm{NN}(f,\tau ) ,\end{aligned}$$

(13.57)

$$\begin{aligned} \varGamma _{22}(f,\tau ) = |H_{2}(f)|^2 \varGamma _\mathrm{SS}(f,\tau )+\varGamma _\mathrm{NN}(f,\tau ) , \end{aligned}$$

(13.58)

where $\varGamma _\mathrm{SS}(f,\tau )$ is the PSD of the target speech signal and $\varGamma _\mathrm{NN}(f,\tau )$ is the PSD of the noise signal. In this chapter, we assume that the left and right noise PSDs are approximately the same, i.e., $\varGamma _\mathrm{N_1 N_1}(f,\tau ) \simeq \varGamma _\mathrm{N_2 N_2}(f,\tau ) \simeq \varGamma _\mathrm{NN}(f,\tau )$.

Next, we consider the Wiener solution between the left and right transfer functions, which is defined as

$$\begin{aligned} H_\mathrm{W}(f,\tau )=\frac{\varGamma _\mathrm{12}(f,\tau )}{\varGamma _\mathrm{22}(f,\tau )}, \end{aligned}$$

(13.59)

where $\varGamma _\mathrm{12}(f)$ is the cross-PSD between the left and right noisy signals. The cross-PSD expression then becomes

$$\begin{aligned} \varGamma _\mathrm{12}(f,\tau )= \varGamma _\mathrm{SS}(f,\tau )H_\mathrm{1}(f)H^*_\mathrm{2}(f). \end{aligned}$$

(13.60)

Therefore, substituting (13.60) into (13.59) yields

$$\begin{aligned} H_\mathrm{W}(f,\tau )=\frac{\varGamma _\mathrm{SS}(f,\tau )H_\mathrm{1}(f)H^*_\mathrm{2}(f)}{\varGamma _\mathrm{22}(f,\tau )}. \end{aligned}$$

(13.61)

Furthermore, using (13.57) and (13.58), the squared magnitude response of the Wiener solution in (13.61) can also be expressed as

$$\begin{aligned} |H_\mathrm{W}(f,\tau )|^2\!=\!\frac{(\varGamma _\mathrm{11}(f,\tau )\!-\!\varGamma _\mathrm{NN}(f,\tau ))(\varGamma _\mathrm{22}(f,\tau )\!-\!\varGamma _\mathrm{NN}(f,\tau ))}{\varGamma ^2_\mathrm{22}(f,\tau )}. \end{aligned}$$

(13.62)

Equation (13.62) is rearranged into the following quadratic equation:

$$\begin{aligned}&\varGamma ^2_\mathrm{NN}(f,\tau ) - \varGamma _\mathrm{NN}(f,\tau )\left( \varGamma _\mathrm{11}(f,\tau )+\varGamma _\mathrm{22}(f,\tau )\right) \nonumber \\&+\varGamma _\mathrm{EE}(f,\tau )\varGamma _\mathrm{22}(f,\tau )=0, \end{aligned}$$

(13.63)

where

$$\begin{aligned} \varGamma _\mathrm{EE}(f,\tau )=\varGamma _\mathrm{11}(f,\tau ) - \varGamma _\mathrm{22}(f,\tau )|H_\mathrm{W}(f)|^2. \end{aligned}$$

(13.64)

Consequently, the noise PSD $\varGamma _\mathrm{NN} (f)$ can be estimated by solving the quadratic equation in (13.63) as follows:

$$\begin{aligned} \varGamma _\mathrm{NN}(f,\tau )&=\frac{1}{2}\left( \varGamma _\mathrm{11}(f,\tau )+\varGamma _\mathrm{22}(f,\tau )\right) - \varGamma _\mathrm{avg}(f,\tau ), \end{aligned}$$

(13.65)

$$\begin{aligned} \varGamma _\mathrm{avg}(f,\tau )&= \frac{1}{2}\{(\varGamma _\mathrm{11}(f,\tau )+\varGamma _\mathrm{22}(f,\tau ))^2\nonumber \\&-4\varGamma _\mathrm{EE}(f,\tau )\varGamma _\mathrm{22}(f,\tau )\}^{0.5}. \end{aligned}$$

(13.66)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Saruwatari, H., Miyazaki, R. (2018). Musical-Noise-Free Blind Speech Extraction Based on Higher-Order Statistics Analysis. In: Makino, S. (eds) Audio Source Separation. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-73031-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-73031-8_13
Published: 02 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73030-1
Online ISBN: 978-3-319-73031-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Musical-Noise-Free Blind Speech Extraction Based on Higher-Order Statistics Analysis

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation