Skip to main content

Emotion Recognition Based on Multimodal Information

  • Chapter
Affective Information Processing

Abstract

Here is a conversation between an interviewer and a subject occurring in an Adult Attachment Interview (Roisman, Tsai, & Chiang, 2004). AUs are facial action units defined in Ekman, Friesen, and Hager (2002).

The interviewer asked: “Now, let you choose five adjective words to describe your childhood relationship with your mother when you were about five years old, or as far back as you remember.”

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adams, R. B&Kleck, R.E(2003). Perceived gaze direction and the processing of facial displays of emotion. Psychological Science, 14, 644–647.

    Article  Google Scholar 

  • Ambady, N.,&Rosenthal, R. (1992). Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychological Bulletin, 111(2), 256–274.

    Article  Google Scholar 

  • Balomenos, T., Raouzaiou, A., Ioannou, S., Drosopoulos, A., Karpouzis, K.,&Kollias, S. (2005).Emotion analysis in man-machine interaction systems (LNCS 3361; pp. 318–328). New York:Springer.

    Google Scholar 

  • Bartlett, M. S., Littlewort, G., Frank, M., Lainscsek, C., Fasel, I.,&Movellan, J. (2005), Recognizing facial expression: machine learning and application to spontaneous behavior. In IEEE International Conference on Computer Vision and Pattern Recognition (pp. 568–573).

    Google Scholar 

  • Batliner, A., Fischer, K., Hubera, R., Spilkera, J.,&Noth, E. (2003). How to find trouble in communication. Speech Communication, 40, 117–143.

    Article  MATH  Google Scholar 

  • Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M., et al. (2004), Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the International Conference on Multimodal Interfaces (pp. 205–211).

    Google Scholar 

  • Caridakis, G., Malatesta, L., Kessous, L., Amir, N., Paouzaiou, A.&Karpouzis, K. (2006). Modeling naturalistic affective states via facial and vocal expression recognition. In Proceedings of the International Conference on Multimodal Interfaces (pp. 146–154).

    Google Scholar 

  • Chen, L., Huang, T. S., Miyasato, T.,&Nakatsu, R. (1998). Multimodal human emotion/expression recognition. In Proceedings of the International Conference on Automatic Face and Gesture Recognition (pp. 396–401).

    Google Scholar 

  • Chen, L. S. (2000), Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction, PhD thesis, University of Illinois at Urbana-Champaign, USA.

    Google Scholar 

  • Cohn, J. F. (2006). Foundations of human computing: Facial expression and emotion. In Proceedings of the International Conference on Multimodal Interfaces (pp. 233–238).

    Google Scholar 

  • Cohn, J. F., Reed, L. I., Ambadar, Z., Xiao, J.,&Moriyama, T. (2004). Automatic analysis and recognition of brow actions and head motion in spontaneous facial behavior. In Proceedings of the International Conference on Systems, Man&Cybernetics, 1 (pp. 610–616).

    Google Scholar 

  • Cohn, J. F.,&Schmidt, K. L.(2004). The timing of facial motion in posed and spontaneous smiles.International Journal of Wavelets, Multiresolution and Information Processing, 2, 1–12.

    Article  MathSciNet  Google Scholar 

  • Cowie, R., Douglas-Cowie, E.,&Cox, C. (2005). Beyond emotion archetypes: Databases for emotion modeling using neural networks. Neural Networks, 18, 371–388.

    Article  Google Scholar 

  • Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M.,&Schröder, M. (2000).‘Feeltrace’ An instrument for recording perceived emotion in real time. In Proceedings of the ISCA Workshop on Speech and Emotion (pp. 19–24).

    Google Scholar 

  • Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W.,&Taylor J. G. (2001), Emotion recognition in human-computer interaction, IEEE Signal Processing Magazine, January (pp. 32–80).

    Google Scholar 

  • Douglas-Cowie, E., Campbell, N., Cowie, R.,&Roach, P. (2003). Emotional speech: Towards a new generation of database. Speech Communication, 40(1–2), 33–60.

    Article  MATH  Google Scholar 

  • Duric, Z., Gray, WD ., Heishman, R., Li, F., Rosenfeld, A., Schoelles, M. J., Schunn, C.,&Wechsler, H. (2002). Integrating perceptual and cognitive modeling for adaptive and intelligent human—computer interaction. Proceedings of the IEEE, 90(7), 1272–1289.

    Article  Google Scholar 

  • Ekman, P. (Ed.) (1982). Emotion in the human face (2nd ed.). New York: Cambridge University Press.

    Google Scholar 

  • Ekman, P.,&Friesen, W. V. (1975). Unmasking the face. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Ekman, P., Friesen, W. V.,&Hager, J. C. (2002). Facial Action Coding System. Salt Lake City, UT:A Human Face.

    Google Scholar 

  • Ekman P.,&Rosenberg, E. L. (2005). What the face reveals: Basic and applied studies of spontaneous expression using the facial action coding system (2nd ed.). Oxford University Press,University of Illinois at Urbana-Champaign, USA.

    Google Scholar 

  • Fragopanagos, F.,&Taylor, J. G. (2005), Emotion recognition in human—computer interaction.Neural Networks, 18, 389–405.

    Article  Google Scholar 

  • Go, H. J., Kwak, K. C., Lee, D. J.,&Chun, M.G. (2003). Emotion recognition from facial image and speech signal. In Proceedings of the International Conference of the Society of Instrument and Control Engineers (pp. 2890–2895).

    Google Scholar 

  • Graciarena, M., Shriberg, E., Stolcke, A., Enos, J. H. F.,&Kajarekar, S. (2006). Combining prosodic, lexical and cepstral systems for deceptive speech detection. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, I, 1033–1036.

    Google Scholar 

  • Gunes, H.,&Piccardi, M. (2005). Affect recognition from face and body: early fusion vs. late fusion. In Proceedings of the International Conference on Systems, Man and Cybernetics (pp. 3437–3443).

    Google Scholar 

  • Gunes, H.,&Piccardi, M. (2006). A bimodal face and body gesture database for automatic analysis of human nonverbal affective behavior. International Conference on Pattern Recognition, 1,1148–1153.

    Google Scholar 

  • Harrigan, J. A., Rosenthal, R.,&Scherer, K. R. (2005). The new handbook of methods in nonverbal behavior research (pp. 369–397). Oxford University Press, USA.

    Google Scholar 

  • Hoch, S., Althoff, F., McGlaun, G.,&Rigoll, G. (2005), Bimodal fusion of emotional data in an automotive environment. In ICASSP, II (pp. 1085–1088).

    Google Scholar 

  • Ji, Q., Lan, P.,&Looney, C. (2006). A probabilistic framework for modeling and real-time monitoring human fatigue. IEEE SMC-Part A, 36(5), 862–875.

    Google Scholar 

  • Kapoor, A., Burleson, W.,&Picard, R. W. (2007). Automatic prediction of frustration. International Journal of Human—Computer Studies, 65(8), 724–736.

    Article  Google Scholar 

  • Kapoor, A.,&Picard, R. W. (2005). Multimodal affect recognition in learning environment. In ACM International Conference on Multimedia (pp. 677–682).

    Google Scholar 

  • Karpouzis, K., Caridakis, G., Kessous, L., Amir, N., Raouzaiou, A., Malatesta, L.,&Kollias, S.(2007). Modeling naturalistic affective states via facial, vocal, and bodily expression recognition (LNAI 4451; pp. 91–112). New York: Springer.

    Google Scholar 

  • Kuncheva, L. I. (2004). Combining pattern classifier: Methods and algorithms. Hoboken, NJ: John Wiley and Sons.

    MATH  Google Scholar 

  • Lee, C. M.,&Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303.

    Article  Google Scholar 

  • Liao, W., Zhang, W., Zhu, Z., Ji, Q.,&Gray, W. (2006), Toward a decision-theoretic framework for affect recognition and user assistance. International Journal of Human-Computer Studies,64(9), 847–873.

    Article  Google Scholar 

  • Lisetti, C. L.,&Nasoz, F. (2002). MAUI: A multimodal affective user interface. In Proceedings of the International Conference on Multimedia (pp. 161–170).

    Google Scholar 

  • Lisetti, C. L.,&Nasoz, F. (2004). Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP Journal on Applied Signal Processing, 11, 1672–1687.

    Article  Google Scholar 

  • Litman, D. J.,&Forbes-Riley, K. (2004), Predicting student emotions in computer-human tutoring dialogues. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), July (pp. 352–359).

    Google Scholar 

  • Littlewort, G. C., Bartlett, M. S.,&Lee, K. (2007). Faces of pain: Automated measurement of spontaneous facial expressions of genuine and posed pain. In Proceedings of the ACM International Conference on Multimodal Interfaces (pp. 15–21).

    Google Scholar 

  • Maat, L.,&Pantic, M. (2006). Gaze-X: Adaptive affective multimodal interface for single-user office scenarios, In Proceedings of the ACM International Conference on Multimodal Interfaces (pp. 171–178).

    Google Scholar 

  • Pal, P., Iyer, A. N.,&Yantorno, R. E. (2006). Emotion detection from infant facial expressions and cries. In Proceedings of the International Conference on Acoustics, Speech&Signal Processing, 2 (pp. 721–724).

    Google Scholar 

  • Pantic, M.,&Bartlett, M. S. (2007). Machine analysis of facial expressions. In K. Delac and M. Grgic, (Eds.), Face recognition (pp. 377–416). Vienna, Austria: I-Tech Education.

    Google Scholar 

  • Pantic, M., Pentland, A., Nijholt, A.,&Huang, T. S. (2006). Human computing and machine understanding of human behavior: A survey. In International Conference on Multimodal Interfaces (pp. 239–248).

    Google Scholar 

  • Pantic M.,&Rothkrantz, L. J. M. (2003), Toward an affect-sensitive multimodal human-computer interaction, Proceedings of the IEEE, 91(9, Sept.), 1370–1390.

    Article  Google Scholar 

  • Pantic, M.,&Rothkrantz, L. J. M. (2004). Case-based reasoning for user-profiled recognition of emotions from face images. In International Conference on Multimedia and Expo (pp. 391–394).

    Google Scholar 

  • Pantic, M., Valstar, M. F, Rademaker, R.,&Maat, L. (2005). Web-based database for facial expression analysis. In International Conference on Multimedia and Expo (pp. 317–321).

    Google Scholar 

  • Patras, I.,&Pantic, M. (2004). Particle filtering with factorized likelihoods for tracking facial features, In Proceedings of the IEEE International Conference on Face and Gesture Recognition (pp. 97–102).

    Google Scholar 

  • Pentland, A. (2005). Socially aware, computation and communication, IEEE Computer, 38, 33–40.

    Google Scholar 

  • Petridis, S.,&Pantic, M. (2008). Audiovisual discrimination between laughter and speech, In IEEE Int'l Conf. Acoustics, Speech, and Signal Processing (pp. 5117–5120).

    Google Scholar 

  • Picard, R. W. (1997). Affective computing. Cambridge, MA: MIT Press.

    Google Scholar 

  • Picard, R. W., Vyzas, E.,&Healey, J. (2001). Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Transactions on Pattern Analysis and Machine Intelligence,23(10), 1175–1191.

    Article  Google Scholar 

  • Pitt, M. K.,&Shephard, N. (1999). Filtering via simulation: auxiliary particle filtering. Journal of the American Statistical Association, 94, 590–599.

    Article  MATH  MathSciNet  Google Scholar 

  • Roisman, G. I., Tsai, J. L.,&Chiang, K. S. (2004). The emotional integration of childhood experience: Physiological, facial expressive, and self-reported emotional response during the adult attachment interview. Developmental Psychology, 40(5), 776–789.

    Article  Google Scholar 

  • Russell, J. A., Bachorowski, J.,&Fernandez-Dols, J. (2003). Facial and vocal expressions of emotion. Ann. Rev. Psychol. 54, 329–349.

    Article  Google Scholar 

  • Scherer K. R. (1999). Appraisal theory. In T. Dalgleish&M. J. Power (Eds.), Handbook of cognition and emotion, New York: Wiley, 637–663.

    Chapter  Google Scholar 

  • Schuller, B., Villar, R. J., Rigoll, G.,&Lang, M. (2005). Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition. In International Conference on Acoustics, Speech, and Signal Processing (pp. 325–328).

    Google Scholar 

  • Sebe, N., Cohen, I., Gevers, T.,&Huang, T. S. (2006). Emotion recognition based on joint visual and audio cues. In International Conference on Pattern Recognition (pp. 1136–1139).

    Google Scholar 

  • Sebe, N., Cohen, I.,&Huang, T. S. (2005). Multimodal emotion recognition. In Handbook of Pattern Recognition and Computer Vision. Singapore: World Scientific.

    Google Scholar 

  • Song, M., Bu, J., Chen, C.,&Li, N. (2004), Audio-visual based emotion recognition—A new approach. In International Conference on Computer Vision and Pattern Recognition (pp. 1020–1025).

    Google Scholar 

  • Stein, B.,&Meredith, M. A. (1993). The merging of senses. Cambridge, MA: MIT Press.

    Google Scholar 

  • Stemmler, G. (2003). Methodological considerations in the psychophysiological study of emotion.In R. J. Davidson, K. R. Scherer,&H. H. Goldsmith (Eds.), Handbook of affective sciences (pp. 225–255). Oxford University Press, USA.

    Google Scholar 

  • Tao, H.,&Huang, T. S. (1999), Explanation-based facial motion tracking using a piecewise Bezier volume deformation mode. In CVPR'99, 1 (pp. 611–617).

    Google Scholar 

  • Truong, K. P.,&van Leeuwen, D. A. (2007) Automatic discrimination between laughter and speech, Speech Communication, 49, 144–158.

    Article  Google Scholar 

  • Valstar, M. F., Gunes, H.,&Pantic, M. (2007). How to distinguish posed from spontaneous smiles using geometric features. In ACM Int'l Conf. Multimodal Interfaces (pp. 38–45).

    Google Scholar 

  • Valstar, M., Pantic, M., Ambadar, Z.,&Cohn, J. F. (2006). Spontaneous vs. posed facial behavior: Automatic analysis of brow actions. In International Conference on Multimedia Interfaces (pp. 162–170).

    Google Scholar 

  • Valstar, M. F.,&Pantic, M. (2006). Fully automatic facial action unit detection and temporal analysis. Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 3,149.

    Google Scholar 

  • Wang, Y.,&Guan, L. (2005), Recognizing human emotion from audiovisual information. In ICASSP, II (pp. 1125–1128).

    Google Scholar 

  • Whissell, C. M. (1989). The dictionary of affect in language. In R. Plutchik&H. Kellerman (Eds.).Emotion: Theory, research and experience. The measurement of emotions (vol. 4; pp. 113–131).New York: Academic Press.

    Google Scholar 

  • Xiao, J., Moriyama, T., Kanade, T.,&Cohn, J. F. (2003). Robust full-motion recovery of head by dynamic templates and re-registration techniques. International Journal of Imaging Systems and Technology, 13(1), 85–94.

    Article  Google Scholar 

  • Yoshimoto, D., Shapiro, A., O'Brian, K.,&Gottman, J. M. (2005). Nonverbal communication coding systems of committed couples. In New handbook of methods in nonverbal behavior research, J.A. Harrigan, R. Rosenthal, and K. R. Scherer (Eds.) (pp. 369–397), USA.

    Google Scholar 

  • Zeng, Z., Hu, Y., Liu, M., Fu, Y.,&Huang, T. S.(2006), Training combination strategy of multi-stream fused hidden markov model for audio-visual affect recognition. In Proceedings of the ACM International Conference on Multimedia (pp. 65–68).

    Google Scholar 

  • Zeng, Z., Hu, Y., Roisman, G. I., Wen, Z., Fu, Y.,&Huang, T. S. (2007a), Audio-visual spontaneous emotion recognition. In T. S. Huang, A. Nijholt, M. Pantic,&A. Pentland (Eds.) Artificial Intelligence for Human Computing (LNAI 4451, pp. 72–90). New York, Springer.

    Chapter  Google Scholar 

  • Zeng, Z., Pantic, M., Roisman, G. I.,&Huang, T. S. (2008a). A survey of affect recognition methods: Audio, visual and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence (in press).

    Google Scholar 

  • Zeng, Z., Tu, J., Liu, M., Zhang, T., Rizzolo, N., Zhang, Z., Huang, T. S., Roth, D.,&Levinson, S.(2004), Bimodal HCI-related Emotion Recognition, In International Conference on Multi-modal Interfaces (pp. 137–143).

    Google Scholar 

  • Zeng, Z., Tu, J., Pianfetti, B.,&Huang, T. S. (2008b). Audio-visual affective expression recognition through multi-stream fused HMM. IEEE Transactions on Multimedia, June 2008, 10(4),570–577.

    Article  Google Scholar 

  • Zeng, Z., Tu, J., Liu, M., Huang, T. S., Pianfetti, B., Roth, D.,&Levinson, S. (2007b). Audio-visual affect recognition. IEEE Transactions on Multimedia, 9(2), 424–428.

    Article  Google Scholar 

  • Zhang, Y.,&Ji, Q. (2005). Active and dynamic information fusion for facial expression understanding from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 699–714.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag London Limited

About this chapter

Cite this chapter

Zeng, Z., Pantic, M., Huang, T.S. (2009). Emotion Recognition Based on Multimodal Information. In: Tao, J., Tan, T. (eds) Affective Information Processing. Springer, London. https://doi.org/10.1007/978-1-84800-306-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-306-4_14

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-305-7

  • Online ISBN: 978-1-84800-306-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics