Abstract
Human emotions using textual cues, speech patterns, and facial expressions can give insight into their mental state. Although there are several uni-modal datasets for emotion recognition, there are very few labeled datasets for multi-modal depression detection. Uni-modal emotion recognition datasets can be harnessed, using the technique of transfer learning, for multi-modal binary emotion detection through video, audio, and text. We propose emotion transfer for mood indication framework based on deep learning to address the task of binary classification of depression using a one-of-three scheme: If the prediction from the network for at least one modality is of the depressed class, we consider the final output as depressed. Such a scheme is beneficial since it will detect an abnormality in any of the modalities and will alert a user to seek help well in advance. Long short-term memory is used to combine the temporal aspects of the audio and the video modalities, and the context of the text. This is followed by fine-tuning the network on a binary dataset for depression detection that has been independently labeled by a standard questionnaire used by psychologists. Data augmentation techniques are used for the generalization of data and to resolve the class imbalance. Our experiments show that our method for binary depression classification (using an ensemble of three modalities) on the Distress Analysis Interview Corpus—Wizard of Oz dataset has higher accuracy in comparison with other benchmark methods.
Similar content being viewed by others
Availability of data and material (data transparency)
We have used already available public data for this work.
Code availability (software application or custom code)
We have not released the code.
References
Alberdi A, Aztiria A, Basarab A (2016) Towards an automatic early stress recognition system for office environments based on multimodal measurements: a review. J Biomed Inform 59:49–75
Alizadeh S, Fazel A (2017) Convolutional neural networks for facial expression recognition. arXiv preprint arXiv:1704.06756
Baltrušaitis T, Robinson P, Morency LP (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–10
Berretti S, Del Bimbo A, Pala P, Amor BB, Daoudi M (2010) A set of selected sift features for 3d facial expression recognition. In: 20th International conference on pattern recognition (ICPR). IEEE, pp 4125–4128
Busso C, Bulut M, Lee CC, Kazemzadeh A, Mower E, Kim S, Chang JN, Lee S, Narayanan SS (2008) IEMOCAP: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335
Correia J, Trancoso I, Raj B (2016) Detecting psychological distress in adults through transcriptions of clinical interviews. In: International conference on advances in speech and language technologies for Iberian languages. Springer, pp 162–171
Cugu I, Sener E, Akbas E (2019) Microexpnet: an extremely small and fast model for expression recognition from face images. In: International conference on image processing theory, tools and applications (IPTA). IEEE, pp 1–6
Danisman T, Alpkocak A (2008) Feeler: emotion classification of text using vector space model. In: AISB 2008 convention communication, interaction and social intelligence, vol 1, p 53
De Silva LC, Miyasato T, Nakatsu R (1997) Facial emotion recognition using multi-modal information. In: Proceedings of the international conference on information, communications and signal processing, ICICS, vol 1. IEEE, pp 397–401
Degottex G, Kane J, Drugman T, Raitio T, Scherer S (2014) COVAREP—a collaborative voice analysis repository for speech technologies. In: IEEE Conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 960–964
Dhamija S, Boult TE (2017) Exploring contextual engagement for trauma recovery. In: IEEE computer vision and pattern recognition workshops (CVPRW). IEEE, pp 2267–2277
Ekman P, Friesen WV (1976) Measuring facial movement. Environ Psychol Nonverb Behav 1(1):56–75
Giannakakis G, Pediaditis M, Manousos D, Kazantzaki E, Chiarugi F, Simos PG, Marias K, Tsiknakis M (2017) Stress and anxiety detection using facial cues from videos. Biomed Signal Process Control 31:89–101
Girard JM, Cohn JF, De la Torre F (2015) Estimating smile intensity: a better way. Pattern Recognit Lett 66:13–21
Gratch J, Artstein R, Lucas GM, Stratou G, Scherer S, Nazarian A, Wood R, Boberg J, DeVault D, Marsella S, et al (2014) The distress analysis interview corpus of human and computer interviews. In: LREC. Citeseer, pp 3123–3128
Hasani B, Mahoor MH (2017) Facial expression recognition using enhanced deep 3d convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 30–40
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778
He L, Cao C (2018) Automated depression analysis using convolutional neural networks from speech. J Biomed Inform 83:103–111
Hosseini S, Lee SH, Cho NI (2018) Feeding hand-crafted features for enhancing the performance of convolutional neural networks. arXiv preprint arXiv:1801.07848
Ko BC (2018) A brief review of facial emotion recognition based on visual information. Sensors 18(2):401
Lopez-Otero P, Docio-Fernandez L, Garcia-Mateo C (2015) Assessing speaker independence on a speech-based depression level estimation system. Pattern Recognit Lett 68:343–350
Lopez-Otero P, Fernández LD, Abad A, Garcia-Mateo C (2017) Depression detection using automatic transcriptions of de-identified speech. In: INTERSPEECH, pp 3157–3161
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: IEEE computer vision and pattern recognition workshops (CVPRW). IEEE, pp 94–101
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–10
Mollahosseini A, Hasani B, Mahoor MH (2017) Affectnet: a database for facial expression, valence, and arousal computing in the wild. arXiv preprint arXiv:1708.03985
Mollahosseini A, Hasani B, Salvador MJ, Abdollahi H, Chan D, Mahoor MH (2016) Facial expression recognition from world wild web. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 58–65
Ortega JD, Senoussaoui M, Granger E, Pedersoli M, Cardinal P, Koerich AL (2019) Multimodal fusion with deep neural networks for audio-video emotion recognition. arXiv preprint arXiv:1907.03196
Pampouchidou A, Marias K, Tsiknakis M, Simos P, Yang F, Meriaudeau F (2015) Designing a framework for assisting depression severity assessment from facial image analysis. In: IEEE international conference on signal and image processing applications (ICSIPA). IEEE, pp 578–583
Poria S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl Based Syst 108:42–49
Poria S, Cambria E, Hazarika D, Vij P (2016) A deeper look into sarcastic tweets using deep convolutional neural networks. arXiv preprint arXiv:1610.08815
Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 439–448
Qureshi SA, Saha S, Hasanuzzaman M, Dias G (2019) Multitask representation learning for multimodal estimation of depression level. IEEE Intell Syst 34(5):45–52
Ray A, Kumar S, Reddy R, Mukherjee P, Garg R (2019) Multi-level attention network using text, audio and video for depression prediction. In: Proceedings of the 9th international on audio/visual emotion challenge and workshop, pp 81–88
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Strapparava C, Mihalcea R (2007) Semeval-2007 task 14: affective text. In: Proceedings of the fourth international workshop on semantic evaluations (SemEval-2007), pp 70–74
Stratou G, Morency LP (2017) Multisense-context-aware nonverbal behavior analysis framework: a psychological distress use case. IEEE Trans Affect Comput 8(2):190–203
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Tarnowski P, Kołodziej M, Majkowski A, Rak RJ (2017) Emotion recognition using facial expressions. Procedia Comput Sci 108:1175–1184
Thomas B, Vinod P, Dhanya K (2014) Multiclass emotion extraction from sentences. Int J Sci Eng Res (IJSER) 5(2):12–15
Tyagi D, Verma A, Sharma S (2017) An improved method for facial expression recognition using hybrid approach of CLBP and Gabor filter. In: 2017 International conference on computing, communication and automation (ICCCA). IEEE, pp 1019–1024
Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, Zafeiriou S (2017) End-to-end multimodal emotion recognition using deep neural networks. IEEE J Sel Top Signal Process 11(8):1301–1309
Won TTD, Won CS (2019) Facial action units for training convolutional neural networks. IEEE Access 7:77816–77824
Yang L, Sahli H, Xia X, Pei E, Oveneke MC, Jiang D (2017) Hybrid depression classification and estimation from audio video and text information. In: Proceedings of the 7th annual workshop on audio/visual emotion challenge, pp 45–51
Acknowledgements
We thank the reviewers for their detailed comments, which have greatly enhanced the presentation of the paper.
Funding
There was no external funding received for this work.
Author information
Authors and Affiliations
Contributions
All the authors have contributed to this work in the order in which their names are mentioned.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Prabhu, S., Mittal, H., Varagani, R. et al. Harnessing emotions for depression detection. Pattern Anal Applic 25, 537–547 (2022). https://doi.org/10.1007/s10044-021-01020-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-021-01020-9