Microphone-Independent Speech Features for Automatic Depression Detection Using Recurrent Neural Network

Ezzi, Mugahed Al-Ezzi Ahmed; Hashim, Nik Nur Wahidah Nik; Basri, Nadzirah Ahmad

doi:10.1007/978-981-16-8515-6_54

Mugahed Al-Ezzi Ahmed Ezzi³⁸,
Nik Nur Wahidah Nik Hashim³⁸ &
Nadzirah Ahmad Basri³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 835))

499 Accesses

Abstract

Depression is a common mental disorder that has a negative impact on individuals, society, and the economy. Traditional clinical diagnosis methods are subjective and necessitate extensive expert participation. Because it is fast, convenient, and non-invasive, automatic depression detection using speech signals is a promising depression objective biomarker. Acoustic feature extraction is one of the most challenging techniques for speech analysis applications in mobile phones. The values of the extracted acoustic features are significantly influenced by adverse environmental noises, a wide range of microphone specifications, and various types of recording software. This study identified microphone-independent acoustic features and utilized them in developing an end-to-end recurrent neural network model to classify depression from Bahasa Malaysia speech. The dataset includes 110 female participants. Patient Health Questionnaire 9, Malay Beck Depression Inventory-II, and subjects’ declaration of Major Depressive Disorder diagnosis by a trained clinician were used to determine depression status. Multiple combinations of speech types were compared and discussed. Robust acoustic features derived from female spontaneous speech achieved an accuracy of 85%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

World Health Organization (2017) Depression and other common mental disorders: global health estimates. World Health Organization
Google Scholar
Mukhtar F, Oei TPS (2011) A review on the prevalence of depression in Malaysia. CPSR 7:234–238. https://doi.org/10.2174/157340011797183201
Article Google Scholar
Institute for Public Health (2015) National health and morbidity survey 2015 (NHMS 2015). Ministry of Health Malaysia Kuala Lumpur
Google Scholar
World Health Organization (2019) GHO|Human resources—data by country. In: World Health Organization. https://apps.who.int/gho/data/view.main.MHHRv. Accessed 25 Jan 2021
Guan NC, Lee TC, Francis B, Yen TS (2018) Psychiatrists in Malaysia: the ratio and distribution. Malays J Psychiatry 27:4–12
Google Scholar
Jiang H, Hu B, Liu Z, Yan L, Wang T, Liu F, Kang H, Li X (2017) Investigation of different speech types and emotions for detecting depression using different classifiers. Speech Commun 90:39–46. https://doi.org/10.1016/j.specom.2017.04.001
Article Google Scholar
Kraepelin E (1921) Manic Depressive Insanity and Paranoia. J Nerv Ment Dis 53:350
Article Google Scholar
Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF (2015) A review of depression and suicide risk assessment using speech analysis. Speech Commun 71:10–49. https://doi.org/10.1016/j.specom.2015.03.004
Article Google Scholar
Stasak B, Epps J, Goecke R (2017) Elicitation design for acoustic depression classification: an investigation of articulation effort, Linguistic complexity, and word affect. In: Interspeech 2017. ISCA, pp 834–838
Google Scholar
Afshan A, Guo J, Park SJ, Ravi V, Flint J, Alwan A (2018) Effectiveness of voice quality features in detecting depression. In: Interspeech 2018. ISCA, pp 1676–1680
Google Scholar
Baranyi P, Csapo A, Sallai G (2015) Cognitive infocommunications (CogInfoCom)
Google Scholar
Alpert M, Pouget ER, Silva RR (2001) Reflections of depression in acoustic measures of the patient’s speech. J Affect Disord 66:59–69. https://doi.org/10.1016/S0165-0327(00)00335-9
Article Google Scholar
Cannizzaro M, Harel B, Reilly N, Chappell P, Snyder PJ (2004) Voice acoustical measurement of the severity of major depression. Brain Cogn 56:30–35. https://doi.org/10.1016/j.bandc.2004.05.003
Article Google Scholar
Hönig F, Batliner A, Nöth E, Schnieder S, Krajewski J (2014) Automatic modelling of depressed speech: relevant features and relevance of gender
Google Scholar
Mundt JC, Vogel AP, Feltner DE, Lenderking WR (2012) Vocal acoustic biomarkers of depression severity and treatment response. Biol Psychiatry 72:580–587. https://doi.org/10.1016/j.biopsych.2012.03.015
Article Google Scholar
Stassen HH, Kuny S, Hell D (1998) The speech analysis approach to determining onset of improvement under antidepressants. Eur Neuropsychopharmacol 8:303–310. https://doi.org/10.1016/S0924-977X(97)00090-4
Article Google Scholar
Liu Z, Kang H, Feng L, Zhang L (2017) Speech pause time: a potential biomarker for depression detection. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, Kansas City, MO, pp 2020–2025
Google Scholar
Low LA, Maddage NC, Lech M, Sheeber LB, Allen NB (2011) Detection of clinical depression in adolescents’ speech during family interactions. IEEE Trans Biomed Eng 58:574–586. https://doi.org/10.1109/TBME.2010.2091640
Article Google Scholar
Cummins N, Epps J, Breakspear M, Goecke R (2011) An investigation of depressed speech detection: features and normalization. In: Twelfth annual conference of the international speech communication association
Google Scholar
Scherer S, Stratou G, Mahmoud M, Boberg J, Gratch J, Rizzo A, Morency L-P (2013) Automatic behavior descriptors for psychological disorder analysis. In: 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, Shanghai, China, pp 1–8
Google Scholar
Alghowinem S, Goecke R, Wagner M, Epps J, Gedeon T, Breakspear M, Parker G (2013) A comparative study of different classifiers for detecting depression from spontaneous speech. In: 2013 IEEE international conference on acoustics, speech and signal processing. pp 8022–8026
Google Scholar
Kiss G, Tulics MG, Sztahó D, Esposito A, Vicsi K (2016) Language independent detection possibilities of depression by speech. In: Esposito A, Faundez-Zanuy M, Esposito AM, Cordasco G, Drugman T, Solé-Casals J, Morabito FC (eds) Recent advances in nonlinear speech processing. Springer International Publishing, Cham, pp 103–114
Chapter Google Scholar
Kiss G, Vicsi K (2014) Physiological and cognitive status monitoring on the base of acoustic-phonetic speech parameters. In: Besacier L, Dediu A-H, Martín-Vide C (eds) Statistical language and speech processing. Springer International Publishing, Cham, pp 120–131
Chapter Google Scholar
Kiss G, Vicsi K (2017) Comparison of read and spontaneous speech in case of automatic detection of depression. In: 2017 8th IEEE international conference on cognitive infocommunications (CogInfoCom). IEEE, Debrecen, pp 000213–000218
Google Scholar
Kiss G, Vicsi K (2017) Mono- and multi-lingual depression prediction based on speech processing. Int J Speech Technol 20:919–935. https://doi.org/10.1007/s10772-017-9455-8
Article Google Scholar
Long H, Guo Z, Wu X, Hu B, Liu Z, Cai H (2017) Detecting depression in speech: comparison and combination between different speech types. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, Kansas City, MO, pp 1052–1058
Google Scholar
Vlasenko B, Sagha H, Cummins N, Schuller B (2017) Implementing gender-dependent vowel-level analysis for boosting speech-based depression recognition. In: Interspeech 2017. ISCA, pp 3266–3270
Google Scholar
Liu Z, Li C, Gao X, Wang G, Yang J (2017) Ensemble-based depression detection in speech. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM). pp 975–980
Google Scholar
Stasak B, Epps J, Lawson A (2017) Analysis of phonetic markedness and gestural effort measures for acoustic speech-based depression classification. In: 2017 seventh international conference on affective computing and intelligent interaction workshops and demos (ACIIW). IEEE, San Antonio, TX, pp 165–170
Google Scholar
Wang J, Sui X, Hu B, Flint J, Bai S, Gao Y, Zhou Y, Zhu T (2018) Detecting postpartum depression in depressed people by speech features. In: Zu Q, Hu B (eds) Human centered computing. Springer International Publishing, Cham, pp 433–442
Chapter Google Scholar
Su Y, Zhang K, Wang J, Zhou D, Madani K (2020) Performance analysis of multiple aggregated acoustic features for environment sound classification. Appl Acoustics 158. https://doi.org/10.1016/j.apacoust.2019.107050
Ghosal D, Kolekar MH (2018) Music genre recognition using deep neural networks and transfer learning. Proceedings of the annual conference of the international speech communication association, INTERSPEECH 2018-Septe:2087–2091. https://doi.org/10.21437/Interspeech.2018-2045
Ellis D (2007) Chroma feature analysis and synthesis. Resources of laboratory for the recognition and organization of speech and audio-LabROSA
Google Scholar
Kattel M, Nepal A, Shah AK, Shrestha D (2019) Chroma feature extraction. In: Conference: chroma feature extraction using fourier transform
Google Scholar
Cohn R (1998) Introduction to neo-riemannian theory: a survey and a historical perspective. J Music Theory 42:167. https://doi.org/10.2307/843871
Article Google Scholar
Jiang D-N, Lu L, Zhang H-J, Tao J-H, Cai L-H (2002) Music type classification by spectral contrast feature. In: Proceedings. IEEE international conference on multimedia and expo. IEEE, pp 113–116
Google Scholar
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28:357–366
Article Google Scholar
McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015) librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference. Citeseer, pp 18–25
Google Scholar

Download references

Acknowledgements

This work was supported by funding from the Ministry of Higher Education Malaysia under the Fundamental Research Grant Scheme (FRGS/1/2018/TK04/UIAM/02/7).

Author information

Authors and Affiliations

Department of Mechatronics Engineering, Faculty of Engineering, International Islamic University Malaysia, Gombak, Malaysia
Mugahed Al-Ezzi Ahmed Ezzi & Nik Nur Wahidah Nik Hashim
Department of Psychiatry, Faculty of Medicine, International Islamic University Malaysia, Jalan Hospital, 25000, Kuantan, Pahang, Malaysia
Nadzirah Ahmad Basri

Authors

Mugahed Al-Ezzi Ahmed Ezzi
View author publications
You can also search for this author in PubMed Google Scholar
Nik Nur Wahidah Nik Hashim
View author publications
You can also search for this author in PubMed Google Scholar
Nadzirah Ahmad Basri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nik Nur Wahidah Nik Hashim .

Editor information

Editors and Affiliations

Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu, Malaysia
Rayner Alfred
School of Information, Science, Security, Network, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Yuto Lim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ezzi, M.AE.A., Hashim, N.N.W.N., Basri, N.A. (2022). Microphone-Independent Speech Features for Automatic Depression Detection Using Recurrent Neural Network. In: Alfred, R., Lim, Y. (eds) Proceedings of the 8th International Conference on Computational Science and Technology. Lecture Notes in Electrical Engineering, vol 835. Springer, Singapore. https://doi.org/10.1007/978-981-16-8515-6_54

Download citation

DOI: https://doi.org/10.1007/978-981-16-8515-6_54
Published: 26 March 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8514-9
Online ISBN: 978-981-16-8515-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics