Skip to main content

Improving Performance of Speaker Identification Systems Using Score Level Fusion of Two Modes of Operation

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

Abstract

In this paper we present a score level fusion methodology for improving the performance of closed-set speaker identification. The fusion is performed on scores which are extracted from GMM-UBM text-dependent and text-independent speaker identification engines. The experimental results indicated that the score level fusion improves the speaker identification performance compared with the best performing single operation mode of speaker identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Campbell Jr., J.P.: Speaker recognition: a tutorial. Proc. IEEE 85(9), 1437–1462 (1997)

    Article  Google Scholar 

  2. Bimbot, F., et al.: A tutorial on text-independent speaker verification. EURASIP J. Appl. Signal Process. 1, 430–451 (2004)

    Article  Google Scholar 

  3. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. Digital Signal Proc. 10(1–3), 19–41 (2000), ISSN 1051–2004

    Google Scholar 

  4. Safavi, S., Hanani, A., Russell, M., Jancovic, P., Carey, M.J.: Contrasting the effects of different frequency bands on speaker and accent identification. IEEE Signal Process. Lett. 19(12), 829–832 (2012)

    Article  Google Scholar 

  5. Safavi, S., Najafian, M., Hanani, A., Russell, M., Jancovic, P., Carey, M.: Speaker recognition for children’s speech. In: INTERSPEECH, pp. 1836–1839 (2012)

    Google Scholar 

  6. Safavi, S.: Speaker characterization using adult and children’s speech. Ph. D. dissertation, University of Birmingham (2015)

    Google Scholar 

  7. Safavi, S., Gan, H., Mporas, I., Sotudeh, R.: Fraud detection in voice-based identity authentication applications and services. In: Proceedings of ICDM (2016)

    Google Scholar 

  8. Hébert, M., Sondhi, M., Huang, Y.: Text-Dependent Speaker Recognition. Handbook of Speech Processing, pp. 743–762. Springer, Heidelberg (2008)

    Google Scholar 

  9. Larcher, A., Lee, K.A., Ma, B., Li, H.: Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014), ISSN 0167–6393, http://dx.doi.org/10.1016/j.specom.2014.03.001

  10. Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  11. Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)

    Article  Google Scholar 

  12. Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)

    Article  Google Scholar 

  13. Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13(5), 308–311 (2006)

    Article  Google Scholar 

  14. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)

    Article  Google Scholar 

  15. Campbell J.P., Reynolds, D.A.: Corpora for the evaluation of speaker recognition systems. In Proceedings of ICASSP 1999, vol. 2, pp. 829–832 (1999)

    Google Scholar 

  16. Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)

    Article  Google Scholar 

  17. Schölkopf, B., Burges, CJ.: Advances in Kernel Methods: Support Vector Learning. MIT press (1999)

    Google Scholar 

  18. Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 3(5), 683–697 (1992)

    Article  Google Scholar 

  19. Quinlan, J.R.: Improved use of continuous attributes in c4.5. J. Artif. Intell. Res. 4, 77–90 (1996)

    MATH  Google Scholar 

  20. Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)

    Google Scholar 

  21. Najafian, M., Safavi, S., Weber, P., Russell, M.: Identification of British English regional accent using fusion of i-vector and multi accent phonotactic systems. In: Proceedings of the ODYSSEY, pp. 132–139 (2016)

    Google Scholar 

  22. Safavi, S., Russell, M., Jancovic, P.: Identification of age-group from children’s speech by computers and humans. In: INTERSPEECH, pp. 243–247 (2014)

    Google Scholar 

Download references

Acknowledgement

This work was partially supported by the H2020 OCTAVE Project entitled “Objective Control for TAlker VErification” funded by the EC with Grand Agreement number 647850.

The authors would like to thank Dr Md Sahidullah, Dr Nicholas Evans and Dr Tomi Kinnunen for their support in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeid Safavi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Safavi, S., Mporas, I. (2017). Improving Performance of Speaker Identification Systems Using Score Level Fusion of Two Modes of Operation. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics