Abstract
Depression is a common mental disorder that has a negative impact on individuals, society, and the economy. Traditional clinical diagnosis methods are subjective and necessitate extensive expert participation. Because it is fast, convenient, and non-invasive, automatic depression detection using speech signals is a promising depression objective biomarker. Acoustic feature extraction is one of the most challenging techniques for speech analysis applications in mobile phones. The values of the extracted acoustic features are significantly influenced by adverse environmental noises, a wide range of microphone specifications, and various types of recording software. This study identified microphone-independent acoustic features and utilized them in developing an end-to-end recurrent neural network model to classify depression from Bahasa Malaysia speech. The dataset includes 110 female participants. Patient Health Questionnaire 9, Malay Beck Depression Inventory-II, and subjects’ declaration of Major Depressive Disorder diagnosis by a trained clinician were used to determine depression status. Multiple combinations of speech types were compared and discussed. Robust acoustic features derived from female spontaneous speech achieved an accuracy of 85%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
World Health Organization (2017) Depression and other common mental disorders: global health estimates. World Health Organization
Mukhtar F, Oei TPS (2011) A review on the prevalence of depression in Malaysia. CPSR 7:234–238. https://doi.org/10.2174/157340011797183201
Institute for Public Health (2015) National health and morbidity survey 2015 (NHMS 2015). Ministry of Health Malaysia Kuala Lumpur
World Health Organization (2019) GHO|Human resources—data by country. In: World Health Organization. https://apps.who.int/gho/data/view.main.MHHRv. Accessed 25 Jan 2021
Guan NC, Lee TC, Francis B, Yen TS (2018) Psychiatrists in Malaysia: the ratio and distribution. Malays J Psychiatry 27:4–12
Jiang H, Hu B, Liu Z, Yan L, Wang T, Liu F, Kang H, Li X (2017) Investigation of different speech types and emotions for detecting depression using different classifiers. Speech Commun 90:39–46. https://doi.org/10.1016/j.specom.2017.04.001
Kraepelin E (1921) Manic Depressive Insanity and Paranoia. J Nerv Ment Dis 53:350
Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF (2015) A review of depression and suicide risk assessment using speech analysis. Speech Commun 71:10–49. https://doi.org/10.1016/j.specom.2015.03.004
Stasak B, Epps J, Goecke R (2017) Elicitation design for acoustic depression classification: an investigation of articulation effort, Linguistic complexity, and word affect. In: Interspeech 2017. ISCA, pp 834–838
Afshan A, Guo J, Park SJ, Ravi V, Flint J, Alwan A (2018) Effectiveness of voice quality features in detecting depression. In: Interspeech 2018. ISCA, pp 1676–1680
Baranyi P, Csapo A, Sallai G (2015) Cognitive infocommunications (CogInfoCom)
Alpert M, Pouget ER, Silva RR (2001) Reflections of depression in acoustic measures of the patient’s speech. J Affect Disord 66:59–69. https://doi.org/10.1016/S0165-0327(00)00335-9
Cannizzaro M, Harel B, Reilly N, Chappell P, Snyder PJ (2004) Voice acoustical measurement of the severity of major depression. Brain Cogn 56:30–35. https://doi.org/10.1016/j.bandc.2004.05.003
Hönig F, Batliner A, Nöth E, Schnieder S, Krajewski J (2014) Automatic modelling of depressed speech: relevant features and relevance of gender
Mundt JC, Vogel AP, Feltner DE, Lenderking WR (2012) Vocal acoustic biomarkers of depression severity and treatment response. Biol Psychiatry 72:580–587. https://doi.org/10.1016/j.biopsych.2012.03.015
Stassen HH, Kuny S, Hell D (1998) The speech analysis approach to determining onset of improvement under antidepressants. Eur Neuropsychopharmacol 8:303–310. https://doi.org/10.1016/S0924-977X(97)00090-4
Liu Z, Kang H, Feng L, Zhang L (2017) Speech pause time: a potential biomarker for depression detection. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, Kansas City, MO, pp 2020–2025
Low LA, Maddage NC, Lech M, Sheeber LB, Allen NB (2011) Detection of clinical depression in adolescents’ speech during family interactions. IEEE Trans Biomed Eng 58:574–586. https://doi.org/10.1109/TBME.2010.2091640
Cummins N, Epps J, Breakspear M, Goecke R (2011) An investigation of depressed speech detection: features and normalization. In: Twelfth annual conference of the international speech communication association
Scherer S, Stratou G, Mahmoud M, Boberg J, Gratch J, Rizzo A, Morency L-P (2013) Automatic behavior descriptors for psychological disorder analysis. In: 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG). IEEE, Shanghai, China, pp 1–8
Alghowinem S, Goecke R, Wagner M, Epps J, Gedeon T, Breakspear M, Parker G (2013) A comparative study of different classifiers for detecting depression from spontaneous speech. In: 2013 IEEE international conference on acoustics, speech and signal processing. pp 8022–8026
Kiss G, Tulics MG, Sztahó D, Esposito A, Vicsi K (2016) Language independent detection possibilities of depression by speech. In: Esposito A, Faundez-Zanuy M, Esposito AM, Cordasco G, Drugman T, Solé-Casals J, Morabito FC (eds) Recent advances in nonlinear speech processing. Springer International Publishing, Cham, pp 103–114
Kiss G, Vicsi K (2014) Physiological and cognitive status monitoring on the base of acoustic-phonetic speech parameters. In: Besacier L, Dediu A-H, MartĂn-Vide C (eds) Statistical language and speech processing. Springer International Publishing, Cham, pp 120–131
Kiss G, Vicsi K (2017) Comparison of read and spontaneous speech in case of automatic detection of depression. In: 2017 8th IEEE international conference on cognitive infocommunications (CogInfoCom). IEEE, Debrecen, pp 000213–000218
Kiss G, Vicsi K (2017) Mono- and multi-lingual depression prediction based on speech processing. Int J Speech Technol 20:919–935. https://doi.org/10.1007/s10772-017-9455-8
Long H, Guo Z, Wu X, Hu B, Liu Z, Cai H (2017) Detecting depression in speech: comparison and combination between different speech types. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, Kansas City, MO, pp 1052–1058
Vlasenko B, Sagha H, Cummins N, Schuller B (2017) Implementing gender-dependent vowel-level analysis for boosting speech-based depression recognition. In: Interspeech 2017. ISCA, pp 3266–3270
Liu Z, Li C, Gao X, Wang G, Yang J (2017) Ensemble-based depression detection in speech. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM). pp 975–980
Stasak B, Epps J, Lawson A (2017) Analysis of phonetic markedness and gestural effort measures for acoustic speech-based depression classification. In: 2017 seventh international conference on affective computing and intelligent interaction workshops and demos (ACIIW). IEEE, San Antonio, TX, pp 165–170
Wang J, Sui X, Hu B, Flint J, Bai S, Gao Y, Zhou Y, Zhu T (2018) Detecting postpartum depression in depressed people by speech features. In: Zu Q, Hu B (eds) Human centered computing. Springer International Publishing, Cham, pp 433–442
Su Y, Zhang K, Wang J, Zhou D, Madani K (2020) Performance analysis of multiple aggregated acoustic features for environment sound classification. Appl Acoustics 158. https://doi.org/10.1016/j.apacoust.2019.107050
Ghosal D, Kolekar MH (2018) Music genre recognition using deep neural networks and transfer learning. Proceedings of the annual conference of the international speech communication association, INTERSPEECH 2018-Septe:2087–2091. https://doi.org/10.21437/Interspeech.2018-2045
Ellis D (2007) Chroma feature analysis and synthesis. Resources of laboratory for the recognition and organization of speech and audio-LabROSA
Kattel M, Nepal A, Shah AK, Shrestha D (2019) Chroma feature extraction. In: Conference: chroma feature extraction using fourier transform
Cohn R (1998) Introduction to neo-riemannian theory: a survey and a historical perspective. J Music Theory 42:167. https://doi.org/10.2307/843871
Jiang D-N, Lu L, Zhang H-J, Tao J-H, Cai L-H (2002) Music type classification by spectral contrast feature. In: Proceedings. IEEE international conference on multimedia and expo. IEEE, pp 113–116
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28:357–366
McFee B, Raffel C, Liang D, Ellis DP, McVicar M, Battenberg E, Nieto O (2015) librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference. Citeseer, pp 18–25
Acknowledgements
This work was supported by funding from the Ministry of Higher Education Malaysia under the Fundamental Research Grant Scheme (FRGS/1/2018/TK04/UIAM/02/7).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ezzi, M.AE.A., Hashim, N.N.W.N., Basri, N.A. (2022). Microphone-Independent Speech Features for Automatic Depression Detection Using Recurrent Neural Network. In: Alfred, R., Lim, Y. (eds) Proceedings of the 8th International Conference on Computational Science and Technology. Lecture Notes in Electrical Engineering, vol 835. Springer, Singapore. https://doi.org/10.1007/978-981-16-8515-6_54
Download citation
DOI: https://doi.org/10.1007/978-981-16-8515-6_54
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8514-9
Online ISBN: 978-981-16-8515-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)