Improving Performance of Speaker Identification Systems Using Score Level Fusion of Two Modes of Operation

Safavi, Saeid; Mporas, Iosif

doi:10.1007/978-3-319-66429-3_43

Saeid Safavi¹⁶ &
Iosif Mporas¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

International Conference on Speech and Computer

2200 Accesses
3 Citations

Abstract

In this paper we present a score level fusion methodology for improving the performance of closed-set speaker identification. The fusion is performed on scores which are extracted from GMM-UBM text-dependent and text-independent speaker identification engines. The experimental results indicated that the score level fusion improves the speaker identification performance compared with the best performing single operation mode of speaker identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Campbell Jr., J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)
Article Google Scholar
Bimbot, F., et al.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Signal Process. 1, 430–451 (2004)
Article Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Proc. 10(1–3), 19–41 (2000), ISSN 1051–2004
Google Scholar
Safavi, S., Hanani, A., Russell, M., Jancovic, P., Carey, M.J.: Contrasting the effects of different frequency bands on speaker and accent identification. IEEE Signal Process. Lett. 19(12), 829–832 (2012)
Article Google Scholar
Safavi, S., Najafian, M., Hanani, A., Russell, M., Jancovic, P., Carey, M.: Speaker recognition for children’s speech. In: INTERSPEECH, pp. 1836–1839 (2012)
Google Scholar
Safavi, S.: Speaker characterization using adult and children’s speech. Ph. D. dissertation, University of Birmingham (2015)
Google Scholar
Safavi, S., Gan, H., Mporas, I., Sotudeh, R.: Fraud detection in voice-based identity authentication applications and services. In: Proceedings of ICDM (2016)
Google Scholar
Hébert, M., Sondhi, M., Huang, Y.: Text-Dependent Speaker Recognition. Handbook of Speech Processing, pp. 743–762. Springer, Heidelberg (2008)
Google Scholar
Larcher, A., Lee, K.A., Ma, B., Li, H.: Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014), ISSN 0167–6393, http://dx.doi.org/10.1016/j.specom.2014.03.001
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)
Article Google Scholar
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Article Google Scholar
Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)
Article Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)
Article Google Scholar
Campbell J.P., Reynolds, D.A.: Corpora for the evaluation of speaker recognition systems. In Proceedings of ICASSP 1999, vol. 2, pp. 829–832 (1999)
Google Scholar
Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)
Article Google Scholar
Schölkopf, B., Burges, CJ.: Advances in Kernel Methods: Support Vector Learning. MIT press (1999)
Google Scholar
Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 3(5), 683–697 (1992)
Article Google Scholar
Quinlan, J.R.: Improved use of continuous attributes in c4.5. J. Artif. Intell. Res. 4, 77–90 (1996)
MATH Google Scholar
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)
Google Scholar
Najafian, M., Safavi, S., Weber, P., Russell, M.: Identification of British English regional accent using fusion of i-vector and multi accent phonotactic systems. In: Proceedings of the ODYSSEY, pp. 132–139 (2016)
Google Scholar
Safavi, S., Russell, M., Jancovic, P.: Identification of age-group from children’s speech by computers and humans. In: INTERSPEECH, pp. 243–247 (2014)
Google Scholar

Download references

Acknowledgement

This work was partially supported by the H2020 OCTAVE Project entitled “Objective Control for TAlker VErification” funded by the EC with Grand Agreement number 647850.

The authors would like to thank Dr Md Sahidullah, Dr Nicholas Evans and Dr Tomi Kinnunen for their support in this work.

Author information

Authors and Affiliations

School of Engineering and Technology, University of Hertfordshire College, Lane Campus, Hatfield, Hertfordshire, AL10 8PE, UK
Saeid Safavi & Iosif Mporas

Authors

Saeid Safavi
View author publications
You can also search for this author in PubMed Google Scholar
Iosif Mporas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeid Safavi .

Editor information

Editors and Affiliations

SPIIRAS, Saint Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Hertfordshire, Hatfield, United Kingdom
Iosif Mporas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Safavi, S., Mporas, I. (2017). Improving Performance of Speaker Identification Systems Using Score Level Fusion of Two Modes of Operation. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-66429-3_43
Published: 13 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics