Harnessing emotions for depression detection

Prabhu, Sahana; Mittal, Himangi; Varagani, Rajesh; Jha, Sweccha; Singh, Shivendra

doi:10.1007/s10044-021-01020-9

Harnessing emotions for depression detection

Original Article
Published: 09 September 2021

Volume 25, pages 537–547, (2022)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Sahana Prabhu ORCID: orcid.org/0000-0002-1685-0128¹,
Himangi Mittal²,
Rajesh Varagani³,
Sweccha Jha⁴ &
…
Shivendra Singh⁵

1205 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

Human emotions using textual cues, speech patterns, and facial expressions can give insight into their mental state. Although there are several uni-modal datasets for emotion recognition, there are very few labeled datasets for multi-modal depression detection. Uni-modal emotion recognition datasets can be harnessed, using the technique of transfer learning, for multi-modal binary emotion detection through video, audio, and text. We propose emotion transfer for mood indication framework based on deep learning to address the task of binary classification of depression using a one-of-three scheme: If the prediction from the network for at least one modality is of the depressed class, we consider the final output as depressed. Such a scheme is beneficial since it will detect an abnormality in any of the modalities and will alert a user to seek help well in advance. Long short-term memory is used to combine the temporal aspects of the audio and the video modalities, and the context of the text. This is followed by fine-tuning the network on a binary dataset for depression detection that has been independently labeled by a standard questionnaire used by psychologists. Data augmentation techniques are used for the generalization of data and to resolve the class imbalance. Our experiments show that our method for binary depression classification (using an ensemble of three modalities) on the Distress Analysis Interview Corpus—Wizard of Oz dataset has higher accuracy in comparison with other benchmark methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal mental state analysis

Article 16 April 2024

Towards Robust Deep Neural Networks for Affect and Depression Recognition from Speech

Distress-Level Detection Using Deep Learning and Transfer Learning Methods

Availability of data and material (data transparency)

We have used already available public data for this work.

Code availability (software application or custom code)

We have not released the code.

References

Alberdi A, Aztiria A, Basarab A (2016) Towards an automatic early stress recognition system for office environments based on multimodal measurements: a review. J Biomed Inform 59:49–75
Article Google Scholar
Alizadeh S, Fazel A (2017) Convolutional neural networks for facial expression recognition. arXiv preprint arXiv:1704.06756
Baltrušaitis T, Robinson P, Morency LP (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–10
Berretti S, Del Bimbo A, Pala P, Amor BB, Daoudi M (2010) A set of selected sift features for 3d facial expression recognition. In: 20th International conference on pattern recognition (ICPR). IEEE, pp 4125–4128
Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335
Article Google Scholar
Correia J, Trancoso I, Raj B (2016) Detecting psychological distress in adults through transcriptions of clinical interviews. In: International conference on advances in speech and language technologies for Iberian languages. Springer, pp 162–171
Cugu I, Sener E, Akbas E (2019) Microexpnet: an extremely small and fast model for expression recognition from face images. In: International conference on image processing theory, tools and applications (IPTA). IEEE, pp 1–6
Danisman T, Alpkocak A (2008) Feeler: emotion classification of text using vector space model. In: AISB 2008 convention communication, interaction and social intelligence, vol 1, p 53
De Silva LC, Miyasato T, Nakatsu R (1997) Facial emotion recognition using multi-modal information. In: Proceedings of the international conference on information, communications and signal processing, ICICS, vol 1. IEEE, pp 397–401
Degottex G, Kane J, Drugman T, Raitio T, Scherer S (2014) COVAREP—a collaborative voice analysis repository for speech technologies. In: IEEE Conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 960–964
Dhamija S, Boult TE (2017) Exploring contextual engagement for trauma recovery. In: IEEE computer vision and pattern recognition workshops (CVPRW). IEEE, pp 2267–2277
Ekman P, Friesen WV (1976) Measuring facial movement. Environ Psychol Nonverb Behav 1(1):56–75
Article Google Scholar
Giannakakis G, Pediaditis M, Manousos D, Kazantzaki E, Chiarugi F, Simos PG, Marias K, Tsiknakis M (2017) Stress and anxiety detection using facial cues from videos. Biomed Signal Process Control 31:89–101
Article Google Scholar
Girard JM, Cohn JF, De la Torre F (2015) Estimating smile intensity: a better way. Pattern Recognit Lett 66:13–21
Article Google Scholar
Gratch J, Artstein R, Lucas GM, Stratou G, Scherer S, Nazarian A, Wood R, Boberg J, DeVault D, Marsella S, et al (2014) The distress analysis interview corpus of human and computer interviews. In: LREC. Citeseer, pp 3123–3128
Hasani B, Mahoor MH (2017) Facial expression recognition using enhanced deep 3d convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 30–40
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
He L, Cao C (2018) Automated depression analysis using convolutional neural networks from speech. J Biomed Inform 83:103–111
Article Google Scholar
Hosseini S, Lee SH, Cho NI (2018) Feeding hand-crafted features for enhancing the performance of convolutional neural networks. arXiv preprint arXiv:1801.07848
Ko BC (2018) A brief review of facial emotion recognition based on visual information. Sensors 18(2):401
Article Google Scholar
Lopez-Otero P, Docio-Fernandez L, Garcia-Mateo C (2015) Assessing speaker independence on a speech-based depression level estimation system. Pattern Recognit Lett 68:343–350
Article Google Scholar
Lopez-Otero P, Fernández LD, Abad A, Garcia-Mateo C (2017) Depression detection using automatic transcriptions of de-identified speech. In: INTERSPEECH, pp 3157–3161
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: IEEE computer vision and pattern recognition workshops (CVPRW). IEEE, pp 94–101
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–10
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. arXiv preprint arXiv:1708.03985
Mollahosseini A, Hasani B, Salvador MJ, Abdollahi H, Chan D, Mahoor MH (2016) Facial expression recognition from world wild web. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 58–65
Ortega JD, Senoussaoui M, Granger E, Pedersoli M, Cardinal P, Koerich AL (2019) Multimodal fusion with deep neural networks for audio-video emotion recognition. arXiv preprint arXiv:1907.03196
Pampouchidou A, Marias K, Tsiknakis M, Simos P, Yang F, Meriaudeau F (2015) Designing a framework for assisting depression severity assessment from facial image analysis. In: IEEE international conference on signal and image processing applications (ICSIPA). IEEE, pp 578–583
Poria S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl Based Syst 108:42–49
Article Google Scholar
Poria S, Cambria E, Hazarika D, Vij P (2016) A deeper look into sarcastic tweets using deep convolutional neural networks. arXiv preprint arXiv:1610.08815
Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 439–448
Qureshi SA, Saha S, Hasanuzzaman M, Dias G (2019) Multitask representation learning for multimodal estimation of depression level. IEEE Intell Syst 34(5):45–52
Article Google Scholar
Ray A, Kumar S, Reddy R, Mukherjee P, Garg R (2019) Multi-level attention network using text, audio and video for depression prediction. In: Proceedings of the 9th international on audio/visual emotion challenge and workshop, pp 81–88
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Strapparava C, Mihalcea R (2007) Semeval-2007 task 14: affective text. In: Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007), pp 70–74
Stratou G, Morency LP (2017) Multisense-context-aware nonverbal behavior analysis framework: a psychological distress use case. IEEE Trans Affect Comput 8(2):190–203
Article Google Scholar
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Article Google Scholar
Tarnowski P, Kołodziej M, Majkowski A, Rak RJ (2017) Emotion recognition using facial expressions. Procedia Comput Sci 108:1175–1184
Article Google Scholar
Thomas B, Vinod P, Dhanya K (2014) Multiclass emotion extraction from sentences. Int J Sci Eng Res (IJSER) 5(2):12–15
Google Scholar
Tyagi D, Verma A, Sharma S (2017) An improved method for facial expression recognition using hybrid approach of CLBP and Gabor filter. In: 2017 International conference on computing, communication and automation (ICCCA). IEEE, pp 1019–1024
Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, Zafeiriou S (2017) End-to-end multimodal emotion recognition using deep neural networks. IEEE J Sel Top Signal Process 11(8):1301–1309
Article Google Scholar
Won TTD, Won CS (2019) Facial action units for training convolutional neural networks. IEEE Access 7:77816–77824
Article Google Scholar
Yang L, Sahli H, Xia X, Pei E, Oveneke MC, Jiang D (2017) Hybrid depression classification and estimation from audio video and text information. In: Proceedings of the 7th annual workshop on audio/visual emotion challenge, pp 45–51

Download references

Acknowledgements

We thank the reviewers for their detailed comments, which have greatly enhanced the presentation of the paper.

Funding

There was no external funding received for this work.

Author information

Authors and Affiliations

Robert Bosch Engineering and Business Solutions, Koramangala, Bangalore, 560095, India
Sahana Prabhu
Jaypee Institute of Information Technology, Noida, India
Himangi Mittal
IIT Hyderabad, Sangareddy, India
Rajesh Varagani
Sir MVIT College, Bangalore, India
Sweccha Jha
IIIT Kottayam, Kottayam, India
Shivendra Singh

Authors

Sahana Prabhu
View author publications
You can also search for this author in PubMed Google Scholar
Himangi Mittal
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh Varagani
View author publications
You can also search for this author in PubMed Google Scholar
Sweccha Jha
View author publications
You can also search for this author in PubMed Google Scholar
Shivendra Singh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors have contributed to this work in the order in which their names are mentioned.

Corresponding author

Correspondence to Sahana Prabhu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prabhu, S., Mittal, H., Varagani, R. et al. Harnessing emotions for depression detection. Pattern Anal Applic 25, 537–547 (2022). https://doi.org/10.1007/s10044-021-01020-9

Download citation

Received: 13 September 2020
Accepted: 11 August 2021
Published: 09 September 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10044-021-01020-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Harnessing emotions for depression detection

Abstract

Access this article

Similar content being viewed by others

Multimodal mental state analysis

Towards Robust Deep Neural Networks for Affect and Depression Recognition from Speech

Distress-Level Detection Using Deep Learning and Transfer Learning Methods

Availability of data and material (data transparency)

Code availability (software application or custom code)

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Harnessing emotions for depression detection

Abstract

Access this article

Similar content being viewed by others

Multimodal mental state analysis

Towards Robust Deep Neural Networks for Affect and Depression Recognition from Speech

Distress-Level Detection Using Deep Learning and Transfer Learning Methods

Availability of data and material (data transparency)

Code availability (software application or custom code)

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation