Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers

Asbai, Nassim; Bengherabi, Messaoud; Amrouche, Abderrahmane; Aklouf, Youcef

doi:10.1007/s10772-014-9260-6

Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers

Published: 19 November 2014

Volume 18, pages 195–203, (2015)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Nassim Asbai^1,2,
Messaoud Bengherabi¹,
Abderrahmane Amrouche² &
…
Youcef Aklouf²

271 Accesses
5 Citations
Explore all metrics

Abstract

This paper brings an improvement of voice activity detection, based on vector quantization and speech enhancement preprocessing (VQ-VAD) proposed recently, and applied to speaker verification system under noisy environment. VQ-VAD is based on computing the likelihood ratio on an utterance-by utterance basis from mel-frequency cepstral coefficients that train speech and non-speech models. Whereas the notion of speech and non-speech segments in speech signal is independent of the speaker. For this, a modified VQ-VAD technique is proposed in this paper, by creating two UBM’s for speech and non-speech models, trained from a long utterance-independence model. Then, an adaptation of UBM’s models to the short utterance of speaker is performed via MAP adaptation, instead of using VQ models. Mel-frequency cepstral coefficient’s were also extracted by using the recently proposed asymmetric tapers instead of the traditional Hamming windowing. Using the GMM–UBM as a baseline system for speaker verification, extensive simulation results were done by adding different noise levels to the clean TIMIT database, characterized by its short training and very short testing utterances. The obtained results show the superiority of the proposed GMM-MAP-VAD approach in adverse conditions. Furthermore a drastic reduction in the EER is observed when using asymmetric tapers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robust polynomial regression-based voice activity detector for speaker verification

Article Open access 11 October 2017

Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds

Multitaper MFCC and normalized multitaper phase-based features for speaker verification

Article 02 March 2019

References

Amrouche, A., Debyeche, M., Taleb-Ahmed, A., Michel Rouvaen, J., & Yagoub, M. C. (2010). An efficient speech recognition system in adverse conditions using the nonparametric regression. Engineering Applications of Artificial Intelligence, 23(1), 85–94.
Article Google Scholar
Asbai, N., Bengherabi, M., Amrouche, A., & Harizi, F. (2013a). Improving speaker verification robustness by front-end diversity and score level fusion. In Proceedings of 2013 International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) kyoto, Japan (pp. 136–142). IEEE.
Asbai, N., Bengherabi, M., Harizi, F., & Amrouche, A. (2013b). Improving the performance of speaker verification systems under noisy conditions using low level features and score level fusion. In International Conference on Signal Processing and Multimedia Applications (SIGMAP), Iceland (pp. 33–38). INSTICC.
Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Acoustics, Speech, and Signal Processing, ICASSP’79 (Vol. 4, pp. 208–211). IEEE.
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. Audio, Speech, and Language Processing, IEEE Transactions on, 19(4), 788–798.
Article Google Scholar
Do, M. N. (2003). Fast approximation of Kullback–Leibler distance for dependence trees and hidden Markov models. Signal Processing Letters, IEEE, 10(4), 115–118.
Article MathSciNet Google Scholar
Gauvain, J. L., & Lee, C. H. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. Speech and Audio Processing, IEEE Transactions on, 2(2), 291–298.
Article Google Scholar
Gerkmann, T., & Hendriks, R. C. (2012). Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. Audio, Speech, and Language Processing, IEEE Transactions on, 20(4), 1383–1393.
Article Google Scholar
Gonzalez-Rodriguez, J., Drygajlo, A., Ramos-Castro, D., Garcia-Gomar, M., & Ortega-Garcia, J. (2006). Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Computer Speech & Language, 20(2), 331–355.
Article Google Scholar
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7), 881–892.
Article Google Scholar
Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007). Joint factor analysis versus eigenchannels in speaker recognition. Audio, Speech, and Language Processing, IEEE Transactions on, 15(4), 1435–1447.
Article Google Scholar
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Article Google Scholar
Kinnunen, T., & Rajan, P. (2013). A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data. In Proceedings of Acoustics, Speech and Signal Processing, 2013. ICASSP 2013. Canada, (pp. 7229–7233). IEEE.
Linde, Y., Buzo, A., & Gray, R. M. (1980). An algorithm for vector quantizer design. Communications, IEEE Transactions on, 28(1), 84–95.
Article Google Scholar
Ma, B., Meng, H. M., Mak, M. W. (2007). Effects of device mismatch, language mismatch and environmental mismatch on speaker verification. In Acoustics, speech and signal processing, 2007. ICASSP 2007. IEEE International Conference on (Vol. 4, pp. IV-301).
Mak, M. W., & Yu, H. B. (2014). A study of voice activity detection techniques for NIST speaker recognition evaluations. Computer Speech & Language, 28(1), 295–313.
Article Google Scholar
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. Speech and Audio Processing, IEEE Transactions on, 9(5), 504–512.
Article Google Scholar
Morales-Cordovilla, J. A., Sánchez, V., Gómez, A. M., & Peinado, A. M. (2012). On the use of asymmetric windows for robust speech recognition. Circuits, Systems, and Signal Processing, 31(2), 727–736.
Article MathSciNet Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital signal processing, 10(1), 19–41.
Article Google Scholar
Rozman, R., & Kodek, D. M. (2007). Using asymmetric windows in automatic speech recognition. Speech communication, 49(4), 268–276.
Article Google Scholar
Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech communication, 12(3), 247–251.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Development of Advanced Technologies (CDTA), Baba Hassen, Algeria
Nassim Asbai & Messaoud Bengherabi
Speech Com. & Signal Proc. Lab., Faculty of Electronics and Computer Sciences, USTHB, 16 111, Bab Ezzouar, Algeria
Nassim Asbai, Abderrahmane Amrouche & Youcef Aklouf

Authors

Nassim Asbai
View author publications
You can also search for this author in PubMed Google Scholar
Messaoud Bengherabi
View author publications
You can also search for this author in PubMed Google Scholar
Abderrahmane Amrouche
View author publications
You can also search for this author in PubMed Google Scholar
Youcef Aklouf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nassim Asbai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Asbai, N., Bengherabi, M., Amrouche, A. et al. Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers. Int J Speech Technol 18, 195–203 (2015). https://doi.org/10.1007/s10772-014-9260-6

Download citation

Received: 04 March 2014
Accepted: 15 October 2014
Published: 19 November 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10772-014-9260-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers

Abstract

Access this article

Similar content being viewed by others

A robust polynomial regression-based voice activity detector for speaker verification

Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds

Multitaper MFCC and normalized multitaper phase-based features for speaker verification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers

Abstract

Access this article

Similar content being viewed by others

A robust polynomial regression-based voice activity detector for speaker verification

Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds

Multitaper MFCC and normalized multitaper phase-based features for speaker verification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation