Animating visible speech and facial expressions

Ma, Jiyong; Cole, Ronald

doi:10.1007/s00371-003-0234-y

Animating visible speech and facial expressions

original article
Published: 04 March 2004

Volume 20, pages 86–105, (2004)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Jiyong Ma¹ &
Ronald Cole¹

110 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

We present four techniques for modeling and animating faces starting from a set of morph targets. The first technique involves obtaining parameters to control individual facial components and learning the mapping from one type of parameter to another through machine learning techniques. The second technique is to fuse visible speech and facial expressions in the lower part of a face. The third technique combines coarticulation rules and kernel smoothing techniques. Finally, a new 3D tongue model with flexible and intuitive skeleton controls is presented. The results of eight animated character models demonstrate that these techniques are powerful and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Albrecht I, Haber J, Seidel H-P (2002) Speech synchronization for physics-based facial animation. In: Proceedings of the international conference in Central Europe on computer graphics, Czech Republic,4 February 2002. Vis Comput Vision 10:9–16
Google Scholar
Badin P, Bailly G, Raybaudi M, Segebarth C (1998) A three-dimensional linear articulatory model based on MRI data. In: Mannell RH, Robert-Ribes J (eds) Proceedings of the 5th international conference on spoken language processing, Sydney, Australia, 4 December 1998, 2:417–420
Barr AH (1981) Superquadrics and angle-preserving transformations. IEEE Comput Graph Appl 1(1):11–23
Google Scholar
Bavelas JB (1994) Gestures as part of speech: methodological implications. Res Lang Soc Interact 27:201–221
Google Scholar
Brand ME (1999) Voice puppetry. In: Proceedings of ACM SIGGRAPH, Los Angeles, 13 August 1999, pp 21–28
Breen AP, Bowers E, Welsh W (1996) An investigation into the generation of mouth shapes for a talking head. In: Proceedings of the international conference on spoken language processing (ICSLP), Philadelphia, 3–6 October 1996, pp 108–111
Bregler C, Covell M, Slaney M (1997) Video rewrite: driving visual speech with audio. In: Proceedings of ACM SIGGRAPH, Los Angeles, 3–8 August 1997, pp 353–360
Cassell J, Vilhjalmsson H, Bickmore T (2001) BEAT: the Behavior Expression Animation Toolkit. In: Proceedings of ACM SIGGRAPH Los Angeles, 12–17 August 2001, pp 477–486
Celniker G, Gossard D (1991) Deformable curve and surface finite-elements for freeform shape design. In: Proceedings of ACM SIGGRAPH, Las Vegas, NV, 28 July–2 August 1991, pp 257–265
Cohen MM, Massaro DW (1993) Modeling coarticulation in synthetic visual speech. In: Thalman NM, Thalman D (eds) Models and techniques in computer animation. Springer, Berlin Heidelberg New York, pp 139–156
Cohn JF, Zlochower A, Lien J, Wu YT, Kanade T (1997) Automated face coding: a computer-vision based method of facial expression analysis. In: Proceedings of the 7th European conference on facial expression, measurement, and meaning, Salzburg, Austria.,16–22 July 1997, pp 329–333
Cole R, Massaro DW, de Villiers J, Rundle B, Shobaki K, Wouters J, Cohen M, Beskow J, Stone P, Connors P, Tarachow A, Solcher D (1999) New tools for interactive speech and language training: using animated conversational agents in the classrooms of profoundly deaf children. In: Proceedings of the ESCA/SOCRRATES workshop on method and tool innovations for speech science education, University College, London, 16–17 April 1999, pp 45–52
Ekman P, Friesen W (1978) Facial action coding system. Consulting Psychologists Press, Palo Alto, CA
Engwall O (2000) A 3D tongue model based on MRI data. In: Proceedings of ICSLP, III, Beijing, 16 October 2000, pp 901–904
Eubank RL (1999) Nonparametric regression and spline smoothing. Marcel Dekker, New York
Ezzat T, Geiger G, Poggio T (2002) Trainable video realistic speech animation. In: Proceedings of ACM SIGGRAPH 2002, San Antonio, TX, 23–26 July 2002, pp 388–398
Farin G (2002) Curves and surfaces for CAGD, 5th edn. Academic, San Diego, pp 155–175
Guenter B, Grimm C, Wood D, Malvar H, Pighin F (1998) Making faces. In: Proceedings of ACM SIGGRAPH, Orlando, FL, 19–24 July 1998, pp 55–66
Jeffers J, Barley M (1971) Speechreading. Thomas, Springfield, IL
Kent RD, Minifie FD (1977) Coarticulation in recent speech production models. J Phonet 5:115–135
Google Scholar
Kent RD (1997) The speech sciences. Singular, San Diego
King SA, Parent RE (2001) A 3D parametric tongue model for animated speech. J Vis Comput Animat 12(3):107–115
Google Scholar
Kleiser J (1989) A fast, efficient, accurate way to represent the human face: state of the art in facial animation.In: Proceedings of ACM SIGGRAPH, Tutorials, Boston, 31 July–4 August 1989, 22:20–33
Koch RM, Gross MH, Carls FR, von Büren DF, Fankhauser G, Parish YIH (1996) Simulating facial surgery using finite element models. In: Proceedings of ACM SIGGRAPH, New Orleans, 4–9 August 1996, pp 421–428
Kouadio C, Poulin P, Lachapelle P (1998) Real time facial animation based upon a bank of 3D facial expressions. In: Proceedings of Computer Animation ’98, Philadelphia, June 1998, pp 128–136
Kshirsagar S, Magnenat-Thalmann N (2000) Lip synchronization using linear predictive analysis. In Proceedings of the IEEE international conference on multimedia and expo (II), New York, 30 July–2 August 2000, pp 1077–1080
Kshirsagar S, Molet T, Magnenat-Thalmann N (2001) Principal components of expressive speech animation. In: Proceedings of Computer Graphics International, Hong Kong, 3 June–6 July 2001, pp 38–44
Lee Y, Terzopoulos D, Waters K (1995) Realistic modeling for facial animation. In: Proceedings of ACM SIGGRAPH’95, Los Angeles, August 1995, pp 55–62
Löfqvist, A (1990) Speech as audible gestures. In: Hardcastle WJ, Marchal A (eds) Speech production and speech modelling. Kluwer, Dordrecht, pp 289–322
Maestri G (1996) Digital character animation.New Riders, Indianapolis
Moccozet L, Magnenat Thalmann N (1997) Dirichlet free-form deformations and their application to hand simulation. In: Proceedings of the IEEE international conference on computer animation, Geneva, 5–6 June 1997, pp 93–102
Magnenat Thalmann N, Primeau E, Thalmann D (1988) Abstract muscle action procedures for human face animation. Vis Comput 3(5):290–297
Google Scholar
Massaro DW (1996) Perceiving talking faces: from speech perception to a behavioral principle. MIT Press, Cambridge, MA
Google Scholar
Ma JY, Yan J, Cole R (2002) CU animate: tools for enabling conversions with animated characters. In: Proceedings of the international conference on spoken language processing (ICSLP), Denver, CO, 16–20 September 2002, 1:197–200
McNeill D (1992) Hand and mind: what gestures reveal about thought. University of Chicago Press, Chicago
Google Scholar
Noh JY, Neumann U (2001) Expression cloning. In: Proceedings of ACM SIGGRAPH, Los Angeles, August 2001, pp 277–288
Öhman SEG (1966) Coarticulation in VCV utterances: spectrographic measurements. J Acoust Soc Am 39:151–168
Google Scholar
Pandzic IS, Forchheimer R (2002) MPEG-4 facial animation: the standard, implementation and applications. Wiley, New York
Parke F (1972) Computer generated animation of face. In: Proceedings of the ACM national conference, Boston, 1 August 1972, pp 451–457
Pighin F, Szeliski R, Salesin D (2002) Modeling and animating realistic faces from images. Int J Comput Vision 50(2):143–1698
Article MATH Google Scholar
Pelachaud C, Badler N, Steedman M (1991) Linguistic issues in facial animation. In: Magnenat-Thalmann N, Thalmann D (eds) Proceedings of Computer Animation, Springer, Berlin Heidelberg New York, 1 June 1991, pp 15–30
Pellom B, Hacioglu K (2003) Recent improvements in the SONIC ASR system for noisy speech: the SPINE task. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), Hong Kong, 6–10 April 2003, 1:4–7
Platt SM, Badler NI (1981) Animating facial expressions. ACM Comput Graph 15(3):245–252
Google Scholar
Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numerical recipes in C. Cambridge University Press, Cambridge, UK
Sanguineti V, Laboissiere R, Payan Y (1997) A control model of human tongue movements in speech. Biol Cybern 77:11–22
Article MATH Google Scholar
Sclaroff S, Pentland A (1995) Modal matching for corrispondence and recognition. IEEE Trans Patt Anal Mach Intell 17(6):545–561
Article Google Scholar
Small LH (1999) Fundamentals of phonetics: a practical guide for students. Allyn & Bacon, Boston
Google Scholar
Stone M, Lundberg A (1996) Three-dimensional tongue surface shapes of English consonants and vowels. J Acoust Soc Am 99(6):3728–3737
Google Scholar
Terzopoulos D, Waters K (1990) Physically-based facial modeling, analysis, and animation. J Vis Comput Animat 1(4):73–80
Google Scholar
Vetter T, Poggio T (1995) Linear object classes and image synthesis from a single example image. IEEE Trans Patt Anal Mach Intell 19(7):733–742
Article Google Scholar
Walther EF (1982) Lipreading. Nelson-Hall, Chicago
Hardcastle WJ, Hewlett N (1999) Coarticulation: theory, data and techniques. Cambridge University Press, Cambridge, UK

Download references

Author information

Authors and Affiliations

Center for Spoken Language Research, University of Colorado at Boulder, Campus Box 594, Boulder, CO, 80309-0594, USA
Jiyong Ma & Ronald Cole

Authors

Jiyong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Ronald Cole
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiyong Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, J., Cole, R. Animating visible speech and facial expressions. Visual Comp 20, 86–105 (2004). https://doi.org/10.1007/s00371-003-0234-y

Download citation

Published: 04 March 2004
Issue Date: May 2004
DOI: https://doi.org/10.1007/s00371-003-0234-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Animating visible speech and facial expressions

Abstract

Access this article

Similar content being viewed by others

Modeling Multimodal Behaviors from Speech Prosody

Techniques for Mimicry and Identity Blending Using Morph Space PCA

Realistic Speech-Driven Facial Animation with GANs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Animating visible speech and facial expressions

Abstract

Access this article

Similar content being viewed by others

Modeling Multimodal Behaviors from Speech Prosody

Techniques for Mimicry and Identity Blending Using Morph Space PCA

Realistic Speech-Driven Facial Animation with GANs

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation