Skip to main content

A Comparison of Human and Machine Estimation of Speaker Age

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9449))

Included in the following conference series:

Abstract

The estimation of the age of a speaker from his or her voice has both forensic and commercial applications. Previous studies have shown that human listeners are able to estimate the age of a speaker to within 10 years on average, while recent machine age estimation systems seem to show superior performance with average errors as low as 6 years. However the machine studies have used highly non-uniform test sets, for which knowledge of the age distribution offers considerable advantage to the system. In this study we compare human and machine performance on the same test data chosen to be uniformly distributed in age. We show that in this case human and machine accuracy is more similar with average errors of 9.8 and 8.6 years respectively, although if panels of listeners are consulted, human accuracy can be improved to a value closer to 7.5 years. Both human and machines have difficulty in accurately predicting the ages of older speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tanner, D.C., Tanner, M.E.: Forensic Aspects of Speech Patterns: Voice Prints, Speaker Profiling, Lie and Intoxication Detection. Lawyers & Judges Publishing, Tucson (2004)

    Google Scholar 

  2. Pellegrini, T., Hedayati, V., Trancoso, I., Hämäläinen, A., Dias, M.: Speaker age estimation for elderly speech recognition in European Portuguese. In: Proceedings of InterSpeech 2014, Singapore, pp. 2962–2966 (2014)

    Google Scholar 

  3. Moyse, E.: Age estimation from faces and voices: a review. Psychologica Belgica 54, 255–265 (2014)

    Article  Google Scholar 

  4. Braun, A., Cerrato, L.: Estimating speaker age across languages. In: Proceedings of ICPhS 1999, San Francisco, pp. 1369–1372 (1999)

    Google Scholar 

  5. Krauss, R., Freyberg, R., Morsella, E.: Inferring speakers’ physical attributes from their voices. J. Exp. Soc. Psychol. 38, 618–625 (2002)

    Article  Google Scholar 

  6. Amilon, K., van de Weijer, J., Schötz, S.: The impact of visual and auditory cues in age estimation. In: Müller, C. (ed.) Speaker Classification II. Lecture Notes in Computer Science LNCS(LNAI), vol. 4441, pp. 10–21. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Moyse, E., Beaufort, A., Brédart, S.: Evidence for an own-age bias in age estimation from voices in older persons. Eur. J. Aging 11, 241–247 (2014)

    Article  Google Scholar 

  8. Bahari, M., McLaren, M., van Hamme, H., van Leeuwen, D.: Speaker age estimation using i-vectors. Eng. Appl. Artif. Intell. 34, 99–108 (2014)

    Article  Google Scholar 

  9. Li, M., Han, K., Narayanan, S.: Automatic speaker age and gender recognition using acoustic and prosodic level information. Comput. Speech Lang. 27, 151–167 (2013)

    Article  Google Scholar 

  10. Bocklet, T., Maier, A., Nöth, E.: Age determination of children in preschool and primary school age with GMM-based supervectors and support vector machines/regression. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 253–260. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Feld, M., Barnard, E., van Heerden, C., Müller, C.: Multilingual spear age recognition: regression analyses on the Lwazi corpus. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 534–539 (2009)

    Google Scholar 

  12. Dobry, G., Hecht, R., Avigal, M., Zigel, Y.: Supervector dimension reduction for efficient speaker age estimation based on the acoustic speech signal. IEEE Trans. Audio Speech Lang. Process. 19, 1975–1985 (2011)

    Article  Google Scholar 

  13. Bahari, M., van Hamme, H.: Speaker age estimation and gender detection based on supervised non-negative matrix factorization. In: Proceedings of IEEE Workshop Biometric Measurements and Systems for Security and Medical Applications, pp. 1–6 (2011)

    Google Scholar 

  14. Bahari, M., van Hamme, H.: Speaker age estimation using hidden Markov model weight supervectors. In: IEEE International Conference on Information Science, Signal Processing and their Applications, pp. 517–521 (2012)

    Google Scholar 

  15. Speech Ark, Second Accents of the British Isles Corpus. www.thespeechark.com/abi-2-page.html

  16. Hadfield, J.: MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. J. Stat. Softw. 33, 1–22 (2010)

    Article  Google Scholar 

  17. Eyben, F., Weninger, F., Groß, F., Schuller, B.: Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM International Conference on Multimedia, Barcelna, Spain, pp. 835–838 (2013)

    Google Scholar 

  18. Schuller, B., Steidl, S., Batliner, A., Epps, J., Eyben, F., Ringeval, F., Marchi, E., Zhang, Y.: The INTERSPEECH 2014 Computational Paralinguistics Challenge: Cognitive and Physical Load. Interspeech 2014, Singapore (2014)

    Google Scholar 

  19. Smola, A., Schölkopf, B.: A tutorial on support vector regression. J. Stat. Comput. 14, 199–222 (2004)

    Article  MathSciNet  Google Scholar 

  20. CRAN Project, E1071 package of functions from Dept. Statistics, TU Wein. cran.r-project.org/web/packages/e1071/index.html

  21. Branco, P., Torgo, L., Ribeiro, R.: A survey of predictive modelling under imbalanced distributions. CoRR abs/1505.01658 (2015)

    Google Scholar 

  22. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  23. Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P.: SMOTE for regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds.) EPIA 2013. LNCS, vol. 8154, pp. 378–389. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  24. Ardila, A.: Normal aging increases cognitive heterogeneity: analysis of dispersion in WAIS-III scores across age. Arch. Clin. Neuropsychol. 22, 1003–1011 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Huckvale .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Huckvale, M., Webb, A. (2015). A Comparison of Human and Machine Estimation of Speaker Age. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25789-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25788-4

  • Online ISBN: 978-3-319-25789-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics