A Speaker Clustering Algorithm for Fast Speaker Adaptation in Continuous Speech Recognition

Rodríguez, Luis Javier; Torres, M. Inés

doi:10.1007/978-3-540-30120-2_55

Luis Javier Rodríguez²¹ &
M. Inés Torres²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3206))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

870 Accesses
3 Citations

Abstract

In this paper a speaker adaptation methodology is proposed, which first automatically determines a number of speaker clusters in the training material, then estimates the parameters of the corresponding models, and finally applies a fast match strategy – based on the so called histogram models – to choose the optimal cluster for each test utterance. The fast match strategy is critical to make this methodology useful in real applications, since carrying out several recognition passes – one for each cluster of speakers – , and then selecting the decoded string with the highest likelihood, would be too costly. Preliminary experimentation over two speech databases in Spanish reveal that both the clustering algorithm and the fast match strategy are consistent and reliable. The histogram models, though being suboptimal – they succeeded in guessing the right cluster for unseen test speakers in 85% of the cases with read speech, and in 63% of the cases with spontaneous speech – , yielded around a 6% decrease in error rate in phonetic recognition experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lee, L., Rose, R.: A frequency warping approach to speaker normalization. IEEE Transactions on Speech and Audio Processing 6, 49–60 (1998)
Article Google Scholar
Gauvain, J., Lee, C.: Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Transactions on Speech and Audio Processing 2, 291–298 (1994)
Article Google Scholar
Leggetter, C., Woodland, P.: Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models. Computer, Speech and Language 9, 171–185 (1995)
Article Google Scholar
Gales, M.: Cluster Adaptive Training of Hidden Markov Models. IEEE Transactions on Speech and Audio Processing 8 (2000)
Google Scholar
Kuhn, R., Junqua, J., Nguyen, P., Niedzielski, N.: Rapid Speaker Adaptation in Eigenvoice Space. IEEE Transactions on Speech and Audio Processing 8, 695–707 (2000)
Article Google Scholar
Faltlhauser, R., Ruske, G.: Robust Speaker Clustering in Eigenspace. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Madonna di Campiglio (Italy), CD-ROM, paper n. 86 (2001)
Google Scholar
Naito, M., Deng, L., Sagisaka, Y.: Speaker clustering for speech recognition using vocal tract parameters. Speech Communication 36, 305–315 (2002)
Article MATH Google Scholar
Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. IEEE Transactions on Communications 28, 84–95 (1980)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Pattern Recognition & Speech Technology Group, DEE. Facultad de Ciencia y Tecnología, Universidad del País Vasco, Apartado 644, 48080, Bilbao, Spain
Luis Javier Rodríguez & M. Inés Torres

Authors

Luis Javier Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
M. Inés Torres
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Botanická 68a, CZ-602 00, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rodríguez, L.J., Torres, M.I. (2004). A Speaker Clustering Algorithm for Fast Speaker Adaptation in Continuous Speech Recognition. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_55

Download citation

DOI: https://doi.org/10.1007/978-3-540-30120-2_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23049-6
Online ISBN: 978-3-540-30120-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics