Multimodal Information Processing for Affective Computing

Tao, Jianhua

doi:10.1007/978-0-387-73819-2_9

Jianhua Tao³

1352 Accesses
1 Citations

Abstract

Affective computing is interaction that relates to, arises from or deliberately influences emotions [1]; it tries to assign computers the human-like capabilities of observation, interpretation and generation of affect features. It is an important topic in human–computer interaction (HCI), because it helps increase the quality of human to computer communications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Picard R W. (1997). Affective Computing. MIT Press, Cambridge, MA.
Google Scholar
James W. (1884). What is emotion? Mind, vol. 9(34), 188–205.
Google Scholar
Oatley K. (1987). Cognitive science and the understanding of emotions. Cogn. Emotion, 3(1), 209–216.
Google Scholar
Bigun E. S., Bigun J., Duc B., Fischer S. (1997). Expert conciliation for multimodal person authentication systems using bayesian statistics. In: Int. Conf. on Audio and Video-Based Biometric Person Authentication (AVBPA), Crans-Montana, Switzerland, 291–300.
Google Scholar
Scherer K. R. (1986). Vocal affect expression: A review and a model for future research. Psychol. Bull., vol. 99(2), 143–165.
Article Google Scholar
Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Commun., 40, 227–256.
Article Google Scholar
Scherer, K. R., Banse, R., Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. J. Cross-Cultural Psychol., 32 (1), 76–92.
Article Google Scholar
Johnstone, T., van Reekum, C. M., Scherer, K. R. (2001). Vocal correlates of appraisal processes. In: Scherer, K. R., Schorr, A., Johnstone, T. (eds) Appraisal Processes in Emotion: Theory, Methods, Research. Oxford University Press, New York and Oxford, 271–284.
Google Scholar
Petrushin, V. A. (2000). Emotion recognition in speech signal: Experimental study, development and application. In: 6th Int. Conf. on Spoken Language Processing, ICSLP2000, Beijing, 222–225.
Google Scholar
Gobl, C., Chasaide, A. N. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Commun., 40(1-2), 189–212.
Article Google Scholar
Tato, R., Santos, R., Kompe, R., Pardo, J. M. (2002). Emotional space improves emotion recognition. In: ICSLP2002, Denver, CO, 2029–2032.
Google Scholar
Dellaert, F., Polzin, T., Waibel, A. (1996). Recognizing emotion in speech. In: ICSLP 1996, Philadelphia, PA, 1970–1973.
Google Scholar
Lee, C. M., Narayanan, S., Pieraccini, R. (2001). Recognition of negative emotion in the human speech signals. In: Workshop on Automatic Speech Recognition and Understanding.
Google Scholar
Yu, F., Chang, E., Xu, Y. Q., Shum H. Y. (2001). Emotion detection from speech to enrich multimedia content. In: The 2nd IEEE Pacific-Rim Conf. on Multimedia, Beijing, China, 550–557.
Google Scholar
Campbell, N. (2004). Perception of affect in speech – towards an automatic processing of paralinguistic information in spoken conversation. In: ICSLP2004, Jeju, 881–884.
Google Scholar
Cahn, J. E. (1990). The generation of affect in synthesized speech. J. Am. Voice I/O Soc., vol. 8, 1–19.
Google Scholar
Schroder, M. (2001). Emotional speech synthesis: A review. In: Eurospeech 2001, Aalborg, Denmark, 561–564.
Google Scholar
Campbell, N. (2004). Synthesis units for conversational speech – using phrasal segments. Autumn Meet. Acoust.: Soc. Jpn., vol. 2005, 337–338.
Google Scholar
Schroder, M., Breuer, S. (2004). XML representation languages as a way of interconnecting TTS modules. In: 8th Int. Conf. on Spoken Language Processing, ICSLP’04, Jeju, Korea.
Google Scholar
Eide, E., Aaron, A., Bakis, R., Hamza, W., Picheny, M., Pitrelli, J. (2002). A corpus-based approach to <ahem/> expressive speech synthesis. In: IEEE Speech Synthesis Workshop, Santa Monica, 79–84.
Google Scholar
Chuang, Z. J., Wu, C. H. (2002). Emotion recognition from textual input using an emotional semantic network. In: Int. Conf. on Spoken Language Processing, ICSLP 2002, Denver, 177–180.
Google Scholar
Tao, J. (2003). Emotion control of chinese speech synthesis in natural environment. In: Eurospeech2003, Geneva.
Google Scholar
Moriyama, T., Ozawa, S. (1999). Emotion recognition and synthesis system on speech. In: IEEE Int. Conf. on Multimedia Computing and Systems, Florence, Italy, 840–844.
Google Scholar
Massaro, D. W., Beskow, J., Cohen, M. M., Fry, C. L., Rodriguez, T. (1999). Picture my voice: Audio to visual speech synthesis using artificial neural networks. In: AVSP’99, Santa Cruz, CA, 133–138.
Google Scholar
Darwin, C. (1872). The Expression of the Emotions in Man and Animals. University of Chicago Press, Chicago.
Book Google Scholar
Etcoff, N. L., Magee, J. J. (1992). Categorical perception of facial expressions. Cognition, vol. 44, 227–240.
Google Scholar
Ekman, P., Friesen, W. V. (1997). Manual for the Facial Action Coding System. Consulting Psychologists Press, Palo Alto, CA.
Google Scholar
Yamamoto, E., Nakamura, S., Shikano, K. (1998). Lip movement synthesis from speech based on Hidden Markov Models. Speech Commun., vol. 26, 105–115.
Article Google Scholar
Tekalp, A. M., Ostermann, J. (2000). Face and 2-D mesh animation in MPEG-4. Signal Process.: Image Commun., vol. 15, 387–421.
Google Scholar
Lyons, M. J., Akamatsu, S., Kamachi, M., Gyoba, J. (1998). Coding facial expressions with gabor wavelets. In: 3rd IEEE Int. Conf. on Automatic Face and Gesture Recognition, Nara, Japan, 200–205.
Google Scholar
Calder, A. J., Burton, A. M., Miller, P., Young, A. W., Akamatsu, S. (2001). A principal component analysis of facial expression. Vis. Res., vol. 41, 1179–208.
Article Google Scholar
Kobayashi, H., Hara, F. (1992). Recognition of six basic facial expressions and their strength by neural network. In: Intl. Workshop on Robotics and Human Communications, New York, 381–386.
Google Scholar
Bregler, C., Covell, M., Slaney, M. (1997). Video rewrite: Driving visual speech with audio. In: ACM SIGGRAPH’97, Los Angeles, CA, 353–360.
Google Scholar
Cosatto, E., Potamianos, G., Graf, H. P. (2000). Audio-visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE Int. Conf. on Multimedia and Expo, New York, 619–622.
Google Scholar
Ezzat, T., Poggio, T. (1998). MikeTalk: A talking facial display based on morphing visemes. In: Computer Animation Conf., Philadelphia, PA, 456–459.
Google Scholar
Gutierrez-Osuna, R., Rundomin, J. L. (2005). Speech-driven facial animation with realistic dynamics. IEEE Trans. Multimedia, vol. 7, 33–42.
Article Google Scholar
Hong, P. Y., Wen, Z., Huang, T. S. (2002). Real-time speech-driven face animation with expressions using neural networks. IEEE Trans. Neural Netw., vol. 13, 916–927.
Article Google Scholar
Verma, A., Subramaniam, L. V., Rajput, N., Neti, C., Faruquie, T. A. (2004). Animating expressive faces across languages. IEEE Trans Multimedia, vol. 6, 791–800.
Article Google Scholar
Collier, G. (1985). Emotional expression, Lawrence Erlbaum Associates. http://faculty.uccb.ns.ca/~gcollier/
Argyle, M. (1988). Bodily Communication. Methuen & Co, New York, NY.
Google Scholar
Siegman, A. W., Feldstein, S. (1985). Multichannel Integrations of Nonverbal Behavior, Lawrence Erlbaum Associates, Hillsdale, NJ.
Google Scholar
Feldman, R. S., Philippot, P., Custrini, R. J. (1991). Social competence and nonverbal behavior. In: Rimé, R. S. F. B. (ed) Fundamentals of Nonverbal Behavior. Cambridge University Press, Cambridge, 329–350.
Google Scholar
Knapp, M. L., Hall, J. A. (2006). Nonverbal Communication in Human Interaction, 6th edn. Thomson Wadsworth, Belmont, CA.
Google Scholar
Go, H. J., Kwak, K. C., Lee, D. J., Chun, M. G. (2003). Emotion recognition from facial image and speech signal. In: Int. Conf. Society of Instrument and Control Engineers, Fukui, Japan, 2890–2895.
Google Scholar
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M. et al. (2004), Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Int. Conf. on Multimodal Interfaces, State College, PA, 205–211.
Google Scholar
Song, M., Bu, J., Chen, C., Li, N. (2004). Audio-visual based emotion recognition – A new approach. In: Int. Conf. on Computer Vision and Pattern Recognition, Washington, DC, USA, 1020–1025.
Google Scholar
Zeng, Z., Tu, J., Liu, M., Zhang, T., Rizzolo, N., Zhang, Z., Huang, T. S., Roth, D., Levinson, S. (2004). Bimodal HCI-related emotion recognition. In: Int. Conf. on Multimodal Interfaces, State College, PA, 137–143.
Google Scholar
Zeng, Z., Tu, J., Pianfetti, B., Huang, T. S. Audio-visual affective expression recognition through multi-stream fused HMM. IEEE Trans. Multimedia, vol. 10(4), 570–577.
Google Scholar
Zeng, Z., Tu, J., Liu, M., Huang, T. S., Pianfetti, B., Roth D., Levinson, S. (2007). Audio-visual affect recognition. IEEE Trans. Multimedia, 9 (2), 424–428.
Article Google Scholar
Wang, Y., Guan, L. (2005). Recognizing human emotion from audiovisual information. In: ICASSP, Philadelphia, PA, Vol. II, 1125–1128.
Google Scholar
Hoch, S., Althoff, F., McGlaun, G., Rigoll, G. (2005). Bimodal fusion of emotional data in an automotive environment. In: ICASSP, Philadelphia, PA, Vol. II, 1085–1088.
Google Scholar
Fragopanagos, F., Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Netw., 18, 389–405.
Article Google Scholar
Pal, P., Iyer, A. N., Yantorno, R. E. (2006). Emotion detection from infant facial expressions and cries. In: Proc. Int’l Conf. on Acoustics, Speech & Signal Processing, Philadelphia, PA, 2, 721–724.
Google Scholar
Caridakis, G., Malatesta, L., Kessous, L., Amir, N., Paouzaiou, A., Karpouzis, K. (2006). Modeling naturalistic affective states via facial and vocal expression recognition. In: Int. Conf. on Multimodal Interfaces, Banff, Alberta, Canada, 146–154.
Google Scholar
Karpouzis, K., Caridakis, G., Kessous, L., Amir, N., Raouzaiou, A., Malatesta, L., Kollias, S. (2007). Modeling naturalistic affective states via facial, vocal, and bodily expression recognition. In: Lecture Notes in Artificial Intelligence, vol. 4451, 91–112.
Google Scholar
Chen, C. Y., Huang, Y. K., Cook, P. (2005). Visual/Acoustic emotion recognition. In: Proc. Int. Conf. on Multimedia and Expo, Amsterdam, Netherlands, 1468–1471.
Google Scholar
Picard, R. W. (2003). Affective computing: Challenges. Int. J. Hum. Comput. Studies, vol. 59, 55–64.
Google Scholar
Ortony, A., Clore, G. L., Collins, A. (1990). The Cognitive Structure of Emotions. Cambridge University Press, Cambridge.
Google Scholar
Carberry, S., de Rosis, F. (2008). Introduction to the Special Issue of UMUAI on ‘Affective Modeling and Adaptation’, International Journal of User Modeling and User-Adapted Interaction, vol. 18, 1–9.
Article Google Scholar
Esposito, A., Balodis, G., Ferreira, A., Cristea, G. (2006). Cross-Modal Analysis of Verbal and Non-verbal Communication. Proposal for a COST Action.
Google Scholar
Yin, P. R., Tao J. H. (2005). Dynamic mapping method based speech driven face animation system. In: The 1st Int. Conf. on Affective Computing and Intelligent Interaction (ACII2005), Beijing., 755–763.
Google Scholar
O’Brien, J. F., Bodenheimer, B., Brostow, G., Hodgins, J. (2000). Automatic joint parameter estimation from magnetic motion capture data. In: Graphics Interface 2000, Montreal, Canada, 53–60.
Google Scholar
Aggarwal, J. K., Cai, Q. (1999). Human motion analysis: A review. Comput. Vision Image Understand., vol. 73(3), 428–440.
Google Scholar
Gavrila, D. M. (1999). The visual analysis of human movement: A survey. Comput. Vision Image Understand., vol. 73(1), 82–98.
Google Scholar
Azarbayejani, A., Wren, C., Pentland, A. (1996). Real-time 3-D tracking of the human body. In: IMAGE’COM 96, Bordeaux, France.
Google Scholar
Camurri, A., Poli, G. D., Leman, M., Volpe, G. (2001). A multi-layered conceptual framework for expressive gesture applications. In: Intl. EU-TMR MOSART Workshop, Barcelona.
Google Scholar
Cowie, R. (2001). Emotion recognition in human-computer interaction. IEEE Signal Process. Mag., vol. 18(1), 32–80.
Article Google Scholar
Brunelli, R., Falavigna, D. (1995). Person identification using multiple cues. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 17(10), 955–966.
Google Scholar
Kumar, A., Wong, D. C., Shen, H. C., Jain, A. K. (2003). Personal verification using palmprint and hand geometry biometric. In: 4th Int. Conf. on Audio- and Video-based Biometric Person Authentication, Guildford, UK, 668–678.
Google Scholar
Frischholz, R. W., Dieckmann, U. (2000). Bioid: A multimodal biometric identification system. IEEE Comput., vol. 33(2), 64–68.
Article Google Scholar
Jain, A. K., Ross, A. (2002). Learning user-specific parameters in a multibiometric system. In: Int. Conf. on Image Processing (ICIP), Rochester, New York, 57–60.
Google Scholar
Ho, T. K., Hull, J. J., Srihari, S. N. (1994). Decision combination in multiple classifier systems. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16(1), 66–75.
Google Scholar
Kittler, J., Hatef, M., Duin, R. P. W., Matas, J. (1998). On combining classifiers. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20(3), 226–239.
Google Scholar
Dieckmann, U., Plankensteiner, P., Wagner T. (1997). Sesam: A biometric person identification system using sensor fusion. Pattern Recognit. Lett., vol. 18, 827–833.
Article Google Scholar
Silva, D., Miyasato, T., Nakatsu, R. (1997). Facial emotion recognition using multi-modal information, In: Proc. Int. Conf. on Information and Communications and Signal Processing, Singapore, 397–401.
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 60575032 and the 863 program under Grant 2006AA01Z138.

Author information

Authors and Affiliations

Chinese Academy of Sciences, Beijing, China
Jianhua Tao

Authors

Jianhua Tao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianhua Tao .

Editor information

Editors and Affiliations

Department of Computing Science & Engineering, Chalmers University of Technology, 412 96, Göteborg, Sweden
Fang Chen
Department of Speech Sciences, University of Helsinki, 9, FIN-00014, Helsinki, Finland
Kristiina Jokinen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tao, J. (2010). Multimodal Information Processing for Affective Computing. In: Chen, F., Jokinen, K. (eds) Speech Technology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-73819-2_9

Download citation

DOI: https://doi.org/10.1007/978-0-387-73819-2_9
Published: 17 April 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-73818-5
Online ISBN: 978-0-387-73819-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics