Skip to main content

Microphone-Independent Speech Features for Automatic Depression Detection Using Recurrent Neural Network

  • Conference paper
  • First Online:
Proceedings of the 8th International Conference on Computational Science and Technology

Abstract

Depression is a common mental disorder that has a negative impact on individuals, society, and the economy. Traditional clinical diagnosis methods are subjective and necessitate extensive expert participation. Because it is fast, convenient, and non-invasive, automatic depression detection using speech signals is a promising depression objective biomarker. Acoustic feature extraction is one of the most challenging techniques for speech analysis applications in mobile phones. The values of the extracted acoustic features are significantly influenced by adverse environmental noises, a wide range of microphone specifications, and various types of recording software. This study identified microphone-independent acoustic features and utilized them in developing an end-to-end recurrent neural network model to classify depression from Bahasa Malaysia speech. The dataset includes 110 female participants. Patient Health Questionnaire 9, Malay Beck Depression Inventory-II, and subjects’ declaration of Major Depressive Disorder diagnosis by a trained clinician were used to determine depression status. Multiple combinations of speech types were compared and discussed. Robust acoustic features derived from female spontaneous speech achieved an accuracy of 85%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. World Health Organization (2017) Depression and other common mental disorders: global health estimates. World Health Organization

    Google Scholar 

  2. Mukhtar F, Oei TPS (2011) A review on the prevalence of depression in Malaysia. CPSR 7:234–238. https://doi.org/10.2174/157340011797183201

    Article  Google Scholar 

  3. Institute for Public Health (2015) National health and morbidity survey 2015 (NHMS 2015). Ministry of Health Malaysia Kuala Lumpur

    Google Scholar 

  4. World Health Organization (2019) GHO|Human resources—data by country. In: World Health Organization. https://apps.who.int/gho/data/view.main.MHHRv. Accessed 25 Jan 2021

  5. Guan NC, Lee TC, Francis B, Yen TS (2018) Psychiatrists in Malaysia: the ratio and distribution. Malays J Psychiatry 27:4–12

    Google Scholar 

  6. Jiang H, Hu B, Liu Z, Yan L, Wang T, Liu F, Kang H, Li X (2017) Investigation of different speech types and emotions for detecting depression using different classifiers. Speech Commun 90:39–46. https://doi.org/10.1016/j.specom.2017.04.001

    Article  Google Scholar 

  7. Kraepelin E (1921) Manic Depressive Insanity and Paranoia. J Nerv Ment Dis 53:350

    Article  Google Scholar 

  8. Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF (2015) A review of depression and suicide risk assessment using speech analysis. Speech Commun 71:10–49. https://doi.org/10.1016/j.specom.2015.03.004

    Article  Google Scholar 

  9. Stasak B, Epps J, Goecke R (2017) Elicitation design for acoustic depression classification: an investigation of articulation effort, Linguistic complexity, and word affect. In: Interspeech 2017. ISCA, pp 834–838

    Google Scholar 

  10. Afshan A, Guo J, Park SJ, Ravi V, Flint J, Alwan A (2018) Effectiveness of voice quality features in detecting depression. In: Interspeech 2018. ISCA, pp 1676–1680

    Google Scholar 

  11. Baranyi P, Csapo A, Sallai G (2015) Cognitive infocommunications (CogInfoCom)

    Google Scholar 

  12. Alpert M, Pouget ER, Silva RR (2001) Reflections of depression in acoustic measures of the patient’s speech. J Affect Disord 66:59–69. https://doi.org/10.1016/S0165-0327(00)00335-9

    Article  Google Scholar 

  13. Cannizzaro M, Harel B, Reilly N, Chappell P, Snyder PJ (2004) Voice acoustical measurement of the severity of major depression. Brain Cogn 56:30–35. https://doi.org/10.1016/j.bandc.2004.05.003

    Article  Google Scholar 

  14. Hönig F, Batliner A, Nöth E, Schnieder S, Krajewski J (2014) Automatic modelling of depressed speech: relevant features and relevance of gender

    Google Scholar 

  15. Mundt JC, Vogel AP, Feltner DE, Lenderking WR (2012) Vocal acoustic biomarkers of depression severity and treatment response. Biol Psychiatry 72:580–587. https://doi.org/10.1016/j.biopsych.2012.03.015

    Article  Google Scholar 

  16. Stassen HH, Kuny S, Hell D (1998) The speech analysis approach to determining onset of improvement under antidepressants. Eur Neuropsychopharmacol 8:303–310. https://doi.org/10.1016/S0924-977X(97)00090-4

    Article  Google Scholar 

  17. Liu Z, Kang H, Feng L, Zhang L (2017) Speech pause time: a potential biomarker for depression detection. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, Kansas City, MO, pp 2020–2025

    Google Scholar 

  18. Low LA, Maddage NC, Lech M, Sheeber LB, Allen NB (2011) Detection of clinical depression in adolescents’ speech during family interactions. IEEE Trans Biomed Eng 58:574–586. https://doi.org/10.1109/TBME.2010.2091640

    Article  Google Scholar 

  19. Cummins N, Epps J, Breakspear M, Goecke R (2011) An investigation of depressed speech detection: features and normalization. In: Twelfth annual conference of the international speech communication association

    Google Scholar 

  20. Scherer S, Stratou G, Mahmoud M, Boberg J, Gratch J, Rizzo A, Morency L-P (2013) Automatic behavior descriptors for psychological disorder analysis. In: 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, Shanghai, China, pp 1–8

    Google Scholar 

  21. Alghowinem S, Goecke R, Wagner M, Epps J, Gedeon T, Breakspear M, Parker G (2013) A comparative study of different classifiers for detecting depression from spontaneous speech. In: 2013 IEEE international conference on acoustics, speech and signal processing. pp 8022–8026

    Google Scholar 

  22. Kiss G, Tulics MG, Sztahó D, Esposito A, Vicsi K (2016) Language independent detection possibilities of depression by speech. In: Esposito A, Faundez-Zanuy M, Esposito AM, Cordasco G, Drugman T, Solé-Casals J, Morabito FC (eds) Recent advances in nonlinear speech processing. Springer International Publishing, Cham, pp 103–114

    Chapter  Google Scholar 

  23. Kiss G, Vicsi K (2014) Physiological and cognitive status monitoring on the base of acoustic-phonetic speech parameters. In: Besacier L, Dediu A-H, Martín-Vide C (eds) Statistical language and speech processing. Springer International Publishing, Cham, pp 120–131

    Chapter  Google Scholar 

  24. Kiss G, Vicsi K (2017) Comparison of read and spontaneous speech in case of automatic detection of depression. In: 2017 8th IEEE international conference on cognitive infocommunications (CogInfoCom). IEEE, Debrecen, pp 000213–000218

    Google Scholar 

  25. Kiss G, Vicsi K (2017) Mono- and multi-lingual depression prediction based on speech processing. Int J Speech Technol 20:919–935. https://doi.org/10.1007/s10772-017-9455-8

    Article  Google Scholar 

  26. Long H, Guo Z, Wu X, Hu B, Liu Z, Cai H (2017) Detecting depression in speech: comparison and combination between different speech types. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, Kansas City, MO, pp 1052–1058

    Google Scholar 

  27. Vlasenko B, Sagha H, Cummins N, Schuller B (2017) Implementing gender-dependent vowel-level analysis for boosting speech-based depression recognition. In: Interspeech 2017. ISCA, pp 3266–3270

    Google Scholar 

  28. Liu Z, Li C, Gao X, Wang G, Yang J (2017) Ensemble-based depression detection in speech. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM). pp 975–980

    Google Scholar 

  29. Stasak B, Epps J, Lawson A (2017) Analysis of phonetic markedness and gestural effort measures for acoustic speech-based depression classification. In: 2017 seventh international conference on affective computing and intelligent interaction workshops and demos (ACIIW). IEEE, San Antonio, TX, pp 165–170

    Google Scholar 

  30. Wang J, Sui X, Hu B, Flint J, Bai S, Gao Y, Zhou Y, Zhu T (2018) Detecting postpartum depression in depressed people by speech features. In: Zu Q, Hu B (eds) Human centered computing. Springer International Publishing, Cham, pp 433–442

    Chapter  Google Scholar 

  31. Su Y, Zhang K, Wang J, Zhou D, Madani K (2020) Performance analysis of multiple aggregated acoustic features for environment sound classification. Appl Acoustics 158. https://doi.org/10.1016/j.apacoust.2019.107050

  32. Ghosal D, Kolekar MH (2018) Music genre recognition using deep neural networks and transfer learning. Proceedings of the annual conference of the international speech communication association, INTERSPEECH 2018-Septe:2087–2091. https://doi.org/10.21437/Interspeech.2018-2045

  33. Ellis D (2007) Chroma feature analysis and synthesis. Resources of laboratory for the recognition and organization of speech and audio-LabROSA

    Google Scholar 

  34. Kattel M, Nepal A, Shah AK, Shrestha D (2019) Chroma feature extraction. In: Conference: chroma feature extraction using fourier transform

    Google Scholar 

  35. Cohn R (1998) Introduction to neo-riemannian theory: a survey and a historical perspective. J Music Theory 42:167. https://doi.org/10.2307/843871

    Article  Google Scholar 

  36. Jiang D-N, Lu L, Zhang H-J, Tao J-H, Cai L-H (2002) Music type classification by spectral contrast feature. In: Proceedings. IEEE international conference on multimedia and expo. IEEE, pp 113–116

    Google Scholar 

  37. Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28:357–366

    Article  Google Scholar 

  38. McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015) librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference. Citeseer, pp 18–25

    Google Scholar 

Download references

Acknowledgements

This work was supported by funding from the Ministry of Higher Education Malaysia under the Fundamental Research Grant Scheme (FRGS/1/2018/TK04/UIAM/02/7).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nik Nur Wahidah Nik Hashim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ezzi, M.AE.A., Hashim, N.N.W.N., Basri, N.A. (2022). Microphone-Independent Speech Features for Automatic Depression Detection Using Recurrent Neural Network. In: Alfred, R., Lim, Y. (eds) Proceedings of the 8th International Conference on Computational Science and Technology. Lecture Notes in Electrical Engineering, vol 835. Springer, Singapore. https://doi.org/10.1007/978-981-16-8515-6_54

Download citation

Publish with us

Policies and ethics