The IIR Submission to CSLP 2006 Speaker Recognition Evaluation

Lee, Kong-Aik; Sun, Hanwu; Tong, Rong; Ma, Bin; Dong, Minghui; You, Changhuai; Zhu, Donglai; Koh, Chin-Wei Eugene; Wang, Lei; Kinnunen, Tomi; Chng, Eng-Siong; Li, Haizhou

doi:10.1007/11939993_52

Kong-Aik Lee²²,
Hanwu Sun²²,
Rong Tong²²,
Bin Ma²²,
Minghui Dong²²,
Changhuai You²²,
Donglai Zhu²²,
Chin-Wei Eugene Koh²³,
Lei Wang²³,
Tomi Kinnunen²²,
Eng-Siong Chng²³ &
…
Haizhou Li^22,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

International Symposium on Chinese Spoken Language Processing

1566 Accesses

Abstract

This paper describes the design and implementation of a practical automatic speaker recognition system for the CSLP speaker recognition evaluation (SRE). The speaker recognition system is built upon four subsystems using speaker information from acoustic spectral features. In addition to the conventional spectral features, a novel temporal discrete cosine transform (TDCT) feature is introduced in order to capture long-term speech dynamic. The speaker information is modeled using two complementary speaker modeling techniques, namely, Gaussian mixture model (GMM) and support vector machine (SVM). The resulting subsystems are then integrated at the score level through a multilayer perceptron (MLP) neural network. Evaluation results confirm that the feature selection, classifier design, and fusion strategy are successful, giving rise to an effective speaker recognition system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Furui, S.: Speaker verification. In: Madisetti, V.K., Williams, D.B. (eds.) Digital Signal Processing Handbook. CRC Press LLC, Boca Raton (1999)
Google Scholar
Quatieri, T.F.: Discrete-time speech signal processing: principles and practice. Prentice-Hall, Upper- Sadder River (2002)
Google Scholar
Evaluation Plan for ISCSLP 2006 Special Session on Speaker Recognition, Chinese Corpus Consortium (April 2006)
Google Scholar
van Leeuwen, D.A., Martin, A.F., Przybocki, M.A., Bouten, J.S.: NIST and NFITNO evaluations of automatic speaker recognition. Computer Speech and Language 20, 128–158 (2006)
Article Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing ASSP-28(4) (August 1980)
Google Scholar
Kinnunen, T.H., Koh, C.W.E., Wang, L., Li, H., Chng, E.S.: Shifted delta cepstrum amd temporal discrete cosine transform features in speaker verification. Accepted for presentation in International Symposium on Chinese Spoken Language Processing (2006)
Google Scholar
Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Margin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska, D., Reynolds, D.A.: A tutorial on textindepent speaker verification. Eurasip Journal on Applied Signal Processing 4, 430–451 (2004)
Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 19–41 (2000)
Article Google Scholar
Campbell, W.M.: Generalized linear discriminant sequence kernels for speaker recognition. In: Proc. ICASSP, pp. 161–164 (2002)
Google Scholar
Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres- Carrasquillo, P.A.: Support vector machines for speaker and language recognition. Computer Speech and Language 20(2-3), 210–229 (2006)
Article Google Scholar
Collobert, R., Bengio, S.: SVMTorch: support vector machines for large-scale regression problems. Journal of Machine Learning Research 1, 143–160 (2001)
Article MathSciNet Google Scholar
Auckenthaler, M.C., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10(1-3), 42–54 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Kong-Aik Lee, Hanwu Sun, Rong Tong, Bin Ma, Minghui Dong, Changhuai You, Donglai Zhu, Tomi Kinnunen & Haizhou Li
School of Computer Engineering, Nanyang Technological University, 639798, Singapore
Chin-Wei Eugene Koh, Lei Wang, Eng-Siong Chng & Haizhou Li

Authors

Kong-Aik Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hanwu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Rong Tong
View author publications
You can also search for this author in PubMed Google Scholar
Bin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Dong
View author publications
You can also search for this author in PubMed Google Scholar
Changhuai You
View author publications
You can also search for this author in PubMed Google Scholar
Donglai Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Wei Eugene Koh
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tomi Kinnunen
View author publications
You can also search for this author in PubMed Google Scholar
Eng-Siong Chng
View author publications
You can also search for this author in PubMed Google Scholar
Haizhou Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, KA. et al. (2006). The IIR Submission to CSLP 2006 Speaker Recognition Evaluation. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_52

Download citation

DOI: https://doi.org/10.1007/11939993_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics