Skip to main content

Modeling Multimodal Behaviors from Speech Prosody

  • Conference paper
Intelligent Virtual Agents (IVA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8108))

Included in the following conference series:

Abstract

Head and eyebrow movements are an important communication mean. They are highly synchronized with speech prosody. Endowing virtual agent with synchronized verbal and nonverbal behavior enhances their communicative performance. In this paper, we propose an animation model for the virtual agent based on a statistical model linking speech prosody and facial movement. A fully parameterized Hidden Markov Model is proposed first to capture the tight relationship between speech and facial movement of a human face extracted from a video corpus and then to drive automatically virtual agent’s behaviors from speech signals. The correlation between head and eyebrow movements is also taken into account during the building of the model. Subjective and objective evaluations were conducted to validate this model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Busso, C., Deng, Z., Neumann, U., Narayanan, S.: Natural head motion synthesis driven by acoustic prosodic features. Journal of Visualization and Computer Animation 16(3-4), 283–290 (2005)

    Google Scholar 

  2. Bevacqua, E., Prepin, K., Niewiadomski, R., de Sevin, E., Pelachaud, C.: GRETA: Towards an Interactive Conversational Virtual Companion. In: Artificial Companions in Society: Perspectives on the Present and Future, pp. 1–17 (2010)

    Google Scholar 

  3. Munhall, K.G., Jones, J.A., Callan, D.E., Kuratate, T., Bateson, E.V.: Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological Science 15(2), 133–137 (2004)

    Article  Google Scholar 

  4. Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press (2004)

    Google Scholar 

  5. Ekman, P.: About brows: Emotional and conversational signals. In: von Cranach, M., Foppa, K., Lepenies, W., Ploog, D. (eds.) Human Ethology: Claims and Limits of a New Discipline: Contributions to the Colloquium, pp. 169–248. Cambridge University Press, Cambridge (1979)

    Google Scholar 

  6. Bolinger, D.: Intonation and Its Uses: Melody in Grammar and Discourse. University Press (1989)

    Google Scholar 

  7. Pelachaud, C., Badler, N.I., Steedman, M.: Generating facial expressions for speech. Cognitive Science 20, 1–46 (1996)

    Article  Google Scholar 

  8. Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Bechet, T., Douville, B., Prevost, S., Stone, M.: Animated conversation: Ruled-based generation of facial expression gesture and spoken intonation for multiple conversational agents. In: Computer Graphics, pp. 413–420 (1994)

    Google Scholar 

  9. Beskow, J.: Rule-based visual speech synthesis. In: 4th European Conference on Speech Communication and Technology ESCA-EUROSPEECH 1995, Madrid (September 1995)

    Google Scholar 

  10. Lee, J., Marsella, S.: Modeling speaker behavior: A comparison of two approaches. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds.) IVA 2012. LNCS, vol. 7502, pp. 161–174. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  11. Chiu, C.-C., Marsella, S.: How to train your avatar: A data driven approach to gesture generation. In: Vilhjálmsson, H.H., Kopp, S., Marsella, S., Thórisson, K.R. (eds.) IVA 2011. LNCS, vol. 6895, pp. 127–140. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  12. Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 257–286 (1989)

    Google Scholar 

  13. Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech parameter generation algorithms for hmm-based speech synthesis. In: ICASSP, pp. 1315–1318 (2000)

    Google Scholar 

  14. Costa, M., Chen, T., Lavagetto, F.: Visual prosody analysis for realistic motion synthesis of 3d head models. In: Proc. of ICAV3D, pp. 343–346 (2001)

    Google Scholar 

  15. Dziemianko, M., Hofer, G., Shimodaira, H.: Hmm-based automatic eye-blink synthesis from speech. In: INTERSPEECH, pp. 1799–1802 (2009)

    Google Scholar 

  16. Hofer, G., Shimodaira, H., Yamagishi, J.: Speech driven head motion synthesis based on a trajectory model. In: ACM SIGGRAPH 2007 Posters (2007)

    Google Scholar 

  17. Busso, C., Deng, Z., Grimm, M., Neumann, U., Narayanan, S.: Rigid head motion in expressive speech animation: Analysis and synthesis. IEEE Trans. on Audio, Speech & Language Processing 15(3), 1075–1086 (2007)

    Article  Google Scholar 

  18. Mariooryad, S., Busso, C.: Generating human-like behaviors using joint, speech-driven models for conversational agents. IEEE Trans. on Audio, Speech & Language Processing 20(8), 2329–2340 (2012)

    Article  Google Scholar 

  19. Xue, J., Borgstrom, J., Jiang, J., Bernstein, L., Alwan, A.: Acoustically-driven talking face synthesis using dynamic bayesian networks. In: 2006 IEEE International Conference on Multimedia and Expo, pp. 1165–1168 (2006)

    Google Scholar 

  20. Levine, S., Krähenbühl, P., Thrun, S., Koltun, V.: Gesture controllers. ACM Trans. Graph. 29(4) (2010)

    Google Scholar 

  21. Ding, Y., Radenen, M., Artières, T., Pelachaud, C.: Speech-driven eyebrow motion synthesis with contextual markovian models. In: ICASSP, pp. 3756–3760 (2013)

    Google Scholar 

  22. Wilson, A.D., Bobick, A.F.: Parametric hidden markov models for gesture recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 884–900 (1999)

    Article  Google Scholar 

  23. Radenen, M., Artières, T.: Contextual hidden markov models. In: ICASSP, pp. 2113–2116 (2012)

    Google Scholar 

  24. Fanelli, G., Gall, J., Romsdorfer, H., Weise, T., Van Gool, L.: A 3-D Audio-Visual Corpus of Affective Communication. IEEE Transactions on Multimedia 12(6), 591–598 (2010)

    Article  Google Scholar 

  25. Pandzic, I., Forcheimer, R.: MPEG4 Facial Animation - The standard, implementations and applications. John Wiley & Sons (2002)

    Google Scholar 

  26. Boersma, P., Weeninck, D.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2001)

    Google Scholar 

  27. Lee, J., Marsella, S.: Predicting speaker head nods and the effects of affective information. IEEE Transactions on Multimedia 12(6), 552–562 (2010)

    Article  Google Scholar 

  28. McNeill, D.: Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ding, Y., Pelachaud, C., Artières, T. (2013). Modeling Multimodal Behaviors from Speech Prosody. In: Aylett, R., Krenn, B., Pelachaud, C., Shimodaira, H. (eds) Intelligent Virtual Agents. IVA 2013. Lecture Notes in Computer Science(), vol 8108. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40415-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40415-3_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40414-6

  • Online ISBN: 978-3-642-40415-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics