Abstract
Affective computing is interaction that relates to, arises from or deliberately influences emotions [1]; it tries to assign computers the human-like capabilities of observation, interpretation and generation of affect features. It is an important topic in human–computer interaction (HCI), because it helps increase the quality of human to computer communications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Picard R W. (1997). Affective Computing. MIT Press, Cambridge, MA.
James W. (1884). What is emotion? Mind, vol. 9(34), 188–205.
Oatley K. (1987). Cognitive science and the understanding of emotions. Cogn. Emotion, 3(1), 209–216.
Bigun E. S., Bigun J., Duc B., Fischer S. (1997). Expert conciliation for multimodal person authentication systems using bayesian statistics. In: Int. Conf. on Audio and Video-Based Biometric Person Authentication (AVBPA), Crans-Montana, Switzerland, 291–300.
Scherer K. R. (1986). Vocal affect expression: A review and a model for future research. Psychol. Bull., vol. 99(2), 143–165.
Scherer, K. R. (2003). Vocal communication of emotion: A review of research paradigms. Speech Commun., 40, 227–256.
Scherer, K. R., Banse, R., Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate across languages and cultures. J. Cross-Cultural Psychol., 32 (1), 76–92.
Johnstone, T., van Reekum, C. M., Scherer, K. R. (2001). Vocal correlates of appraisal processes. In: Scherer, K. R., Schorr, A., Johnstone, T. (eds) Appraisal Processes in Emotion: Theory, Methods, Research. Oxford University Press, New York and Oxford, 271–284.
Petrushin, V. A. (2000). Emotion recognition in speech signal: Experimental study, development and application. In: 6th Int. Conf. on Spoken Language Processing, ICSLP2000, Beijing, 222–225.
Gobl, C., Chasaide, A. N. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Commun., 40(1-2), 189–212.
Tato, R., Santos, R., Kompe, R., Pardo, J. M. (2002). Emotional space improves emotion recognition. In: ICSLP2002, Denver, CO, 2029–2032.
Dellaert, F., Polzin, T., Waibel, A. (1996). Recognizing emotion in speech. In: ICSLP 1996, Philadelphia, PA, 1970–1973.
Lee, C. M., Narayanan, S., Pieraccini, R. (2001). Recognition of negative emotion in the human speech signals. In: Workshop on Automatic Speech Recognition and Understanding.
Yu, F., Chang, E., Xu, Y. Q., Shum H. Y. (2001). Emotion detection from speech to enrich multimedia content. In: The 2nd IEEE Pacific-Rim Conf. on Multimedia, Beijing, China, 550–557.
Campbell, N. (2004). Perception of affect in speech – towards an automatic processing of paralinguistic information in spoken conversation. In: ICSLP2004, Jeju, 881–884.
Cahn, J. E. (1990). The generation of affect in synthesized speech. J. Am. Voice I/O Soc., vol. 8, 1–19.
Schroder, M. (2001). Emotional speech synthesis: A review. In: Eurospeech 2001, Aalborg, Denmark, 561–564.
Campbell, N. (2004). Synthesis units for conversational speech – using phrasal segments. Autumn Meet. Acoust.: Soc. Jpn., vol. 2005, 337–338.
Schroder, M., Breuer, S. (2004). XML representation languages as a way of interconnecting TTS modules. In: 8th Int. Conf. on Spoken Language Processing, ICSLP’04, Jeju, Korea.
Eide, E., Aaron, A., Bakis, R., Hamza, W., Picheny, M., Pitrelli, J. (2002). A corpus-based approach to <ahem/> expressive speech synthesis. In: IEEE Speech Synthesis Workshop, Santa Monica, 79–84.
Chuang, Z. J., Wu, C. H. (2002). Emotion recognition from textual input using an emotional semantic network. In: Int. Conf. on Spoken Language Processing, ICSLP 2002, Denver, 177–180.
Tao, J. (2003). Emotion control of chinese speech synthesis in natural environment. In: Eurospeech2003, Geneva.
Moriyama, T., Ozawa, S. (1999). Emotion recognition and synthesis system on speech. In: IEEE Int. Conf. on Multimedia Computing and Systems, Florence, Italy, 840–844.
Massaro, D. W., Beskow, J., Cohen, M. M., Fry, C. L., Rodriguez, T. (1999). Picture my voice: Audio to visual speech synthesis using artificial neural networks. In: AVSP’99, Santa Cruz, CA, 133–138.
Darwin, C. (1872). The Expression of the Emotions in Man and Animals. University of Chicago Press, Chicago.
Etcoff, N. L., Magee, J. J. (1992). Categorical perception of facial expressions. Cognition, vol. 44, 227–240.
Ekman, P., Friesen, W. V. (1997). Manual for the Facial Action Coding System. Consulting Psychologists Press, Palo Alto, CA.
Yamamoto, E., Nakamura, S., Shikano, K. (1998). Lip movement synthesis from speech based on Hidden Markov Models. Speech Commun., vol. 26, 105–115.
Tekalp, A. M., Ostermann, J. (2000). Face and 2-D mesh animation in MPEG-4. Signal Process.: Image Commun., vol. 15, 387–421.
Lyons, M. J., Akamatsu, S., Kamachi, M., Gyoba, J. (1998). Coding facial expressions with gabor wavelets. In: 3rd IEEE Int. Conf. on Automatic Face and Gesture Recognition, Nara, Japan, 200–205.
Calder, A. J., Burton, A. M., Miller, P., Young, A. W., Akamatsu, S. (2001). A principal component analysis of facial expression. Vis. Res., vol. 41, 1179–208.
Kobayashi, H., Hara, F. (1992). Recognition of six basic facial expressions and their strength by neural network. In: Intl. Workshop on Robotics and Human Communications, New York, 381–386.
Bregler, C., Covell, M., Slaney, M. (1997). Video rewrite: Driving visual speech with audio. In: ACM SIGGRAPH’97, Los Angeles, CA, 353–360.
Cosatto, E., Potamianos, G., Graf, H. P. (2000). Audio-visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE Int. Conf. on Multimedia and Expo, New York, 619–622.
Ezzat, T., Poggio, T. (1998). MikeTalk: A talking facial display based on morphing visemes. In: Computer Animation Conf., Philadelphia, PA, 456–459.
Gutierrez-Osuna, R., Rundomin, J. L. (2005). Speech-driven facial animation with realistic dynamics. IEEE Trans. Multimedia, vol. 7, 33–42.
Hong, P. Y., Wen, Z., Huang, T. S. (2002). Real-time speech-driven face animation with expressions using neural networks. IEEE Trans. Neural Netw., vol. 13, 916–927.
Verma, A., Subramaniam, L. V., Rajput, N., Neti, C., Faruquie, T. A. (2004). Animating expressive faces across languages. IEEE Trans Multimedia, vol. 6, 791–800.
Collier, G. (1985). Emotional expression, Lawrence Erlbaum Associates. http://faculty.uccb.ns.ca/~gcollier/
Argyle, M. (1988). Bodily Communication. Methuen & Co, New York, NY.
Siegman, A. W., Feldstein, S. (1985). Multichannel Integrations of Nonverbal Behavior, Lawrence Erlbaum Associates, Hillsdale, NJ.
Feldman, R. S., Philippot, P., Custrini, R. J. (1991). Social competence and nonverbal behavior. In: Rimé, R. S. F. B. (ed) Fundamentals of Nonverbal Behavior. Cambridge University Press, Cambridge, 329–350.
Knapp, M. L., Hall, J. A. (2006). Nonverbal Communication in Human Interaction, 6th edn. Thomson Wadsworth, Belmont, CA.
Go, H. J., Kwak, K. C., Lee, D. J., Chun, M. G. (2003). Emotion recognition from facial image and speech signal. In: Int. Conf. Society of Instrument and Control Engineers, Fukui, Japan, 2890–2895.
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M. et al. (2004), Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Int. Conf. on Multimodal Interfaces, State College, PA, 205–211.
Song, M., Bu, J., Chen, C., Li, N. (2004). Audio-visual based emotion recognition – A new approach. In: Int. Conf. on Computer Vision and Pattern Recognition, Washington, DC, USA, 1020–1025.
Zeng, Z., Tu, J., Liu, M., Zhang, T., Rizzolo, N., Zhang, Z., Huang, T. S., Roth, D., Levinson, S. (2004). Bimodal HCI-related emotion recognition. In: Int. Conf. on Multimodal Interfaces, State College, PA, 137–143.
Zeng, Z., Tu, J., Pianfetti, B., Huang, T. S. Audio-visual affective expression recognition through multi-stream fused HMM. IEEE Trans. Multimedia, vol. 10(4), 570–577.
Zeng, Z., Tu, J., Liu, M., Huang, T. S., Pianfetti, B., Roth D., Levinson, S. (2007). Audio-visual affect recognition. IEEE Trans. Multimedia, 9 (2), 424–428.
Wang, Y., Guan, L. (2005). Recognizing human emotion from audiovisual information. In: ICASSP, Philadelphia, PA, Vol. II, 1125–1128.
Hoch, S., Althoff, F., McGlaun, G., Rigoll, G. (2005). Bimodal fusion of emotional data in an automotive environment. In: ICASSP, Philadelphia, PA, Vol. II, 1085–1088.
Fragopanagos, F., Taylor, J. G. (2005). Emotion recognition in human-computer interaction. Neural Netw., 18, 389–405.
Pal, P., Iyer, A. N., Yantorno, R. E. (2006). Emotion detection from infant facial expressions and cries. In: Proc. Int’l Conf. on Acoustics, Speech & Signal Processing, Philadelphia, PA, 2, 721–724.
Caridakis, G., Malatesta, L., Kessous, L., Amir, N., Paouzaiou, A., Karpouzis, K. (2006). Modeling naturalistic affective states via facial and vocal expression recognition. In: Int. Conf. on Multimodal Interfaces, Banff, Alberta, Canada, 146–154.
Karpouzis, K., Caridakis, G., Kessous, L., Amir, N., Raouzaiou, A., Malatesta, L., Kollias, S. (2007). Modeling naturalistic affective states via facial, vocal, and bodily expression recognition. In: Lecture Notes in Artificial Intelligence, vol. 4451, 91–112.
Chen, C. Y., Huang, Y. K., Cook, P. (2005). Visual/Acoustic emotion recognition. In: Proc. Int. Conf. on Multimedia and Expo, Amsterdam, Netherlands, 1468–1471.
Picard, R. W. (2003). Affective computing: Challenges. Int. J. Hum. Comput. Studies, vol. 59, 55–64.
Ortony, A., Clore, G. L., Collins, A. (1990). The Cognitive Structure of Emotions. Cambridge University Press, Cambridge.
Carberry, S., de Rosis, F. (2008). Introduction to the Special Issue of UMUAI on ‘Affective Modeling and Adaptation’, International Journal of User Modeling and User-Adapted Interaction, vol. 18, 1–9.
Esposito, A., Balodis, G., Ferreira, A., Cristea, G. (2006). Cross-Modal Analysis of Verbal and Non-verbal Communication. Proposal for a COST Action.
Yin, P. R., Tao J. H. (2005). Dynamic mapping method based speech driven face animation system. In: The 1st Int. Conf. on Affective Computing and Intelligent Interaction (ACII2005), Beijing., 755–763.
O’Brien, J. F., Bodenheimer, B., Brostow, G., Hodgins, J. (2000). Automatic joint parameter estimation from magnetic motion capture data. In: Graphics Interface 2000, Montreal, Canada, 53–60.
Aggarwal, J. K., Cai, Q. (1999). Human motion analysis: A review. Comput. Vision Image Understand., vol. 73(3), 428–440.
Gavrila, D. M. (1999). The visual analysis of human movement: A survey. Comput. Vision Image Understand., vol. 73(1), 82–98.
Azarbayejani, A., Wren, C., Pentland, A. (1996). Real-time 3-D tracking of the human body. In: IMAGE’COM 96, Bordeaux, France.
Camurri, A., Poli, G. D., Leman, M., Volpe, G. (2001). A multi-layered conceptual framework for expressive gesture applications. In: Intl. EU-TMR MOSART Workshop, Barcelona.
Cowie, R. (2001). Emotion recognition in human-computer interaction. IEEE Signal Process. Mag., vol. 18(1), 32–80.
Brunelli, R., Falavigna, D. (1995). Person identification using multiple cues. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 17(10), 955–966.
Kumar, A., Wong, D. C., Shen, H. C., Jain, A. K. (2003). Personal verification using palmprint and hand geometry biometric. In: 4th Int. Conf. on Audio- and Video-based Biometric Person Authentication, Guildford, UK, 668–678.
Frischholz, R. W., Dieckmann, U. (2000). Bioid: A multimodal biometric identification system. IEEE Comput., vol. 33(2), 64–68.
Jain, A. K., Ross, A. (2002). Learning user-specific parameters in a multibiometric system. In: Int. Conf. on Image Processing (ICIP), Rochester, New York, 57–60.
Ho, T. K., Hull, J. J., Srihari, S. N. (1994). Decision combination in multiple classifier systems. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16(1), 66–75.
Kittler, J., Hatef, M., Duin, R. P. W., Matas, J. (1998). On combining classifiers. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20(3), 226–239.
Dieckmann, U., Plankensteiner, P., Wagner T. (1997). Sesam: A biometric person identification system using sensor fusion. Pattern Recognit. Lett., vol. 18, 827–833.
Silva, D., Miyasato, T., Nakatsu, R. (1997). Facial emotion recognition using multi-modal information, In: Proc. Int. Conf. on Information and Communications and Signal Processing, Singapore, 397–401.
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 60575032 and the 863 program under Grant 2006AA01Z138.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Tao, J. (2010). Multimodal Information Processing for Affective Computing. In: Chen, F., Jokinen, K. (eds) Speech Technology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-73819-2_9
Download citation
DOI: https://doi.org/10.1007/978-0-387-73819-2_9
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-73818-5
Online ISBN: 978-0-387-73819-2
eBook Packages: EngineeringEngineering (R0)