Abstract
We present four techniques for modeling and animating faces starting from a set of morph targets. The first technique involves obtaining parameters to control individual facial components and learning the mapping from one type of parameter to another through machine learning techniques. The second technique is to fuse visible speech and facial expressions in the lower part of a face. The third technique combines coarticulation rules and kernel smoothing techniques. Finally, a new 3D tongue model with flexible and intuitive skeleton controls is presented. The results of eight animated character models demonstrate that these techniques are powerful and effective.
Similar content being viewed by others
References
Albrecht I, Haber J, Seidel H-P (2002) Speech synchronization for physics-based facial animation. In: Proceedings of the international conference in Central Europe on computer graphics, Czech Republic,4 February 2002. Vis Comput Vision 10:9–16
Badin P, Bailly G, Raybaudi M, Segebarth C (1998) A three-dimensional linear articulatory model based on MRI data. In: Mannell RH, Robert-Ribes J (eds) Proceedings of the 5th international conference on spoken language processing, Sydney, Australia, 4 December 1998, 2:417–420
Barr AH (1981) Superquadrics and angle-preserving transformations. IEEE Comput Graph Appl 1(1):11–23
Bavelas JB (1994) Gestures as part of speech: methodological implications. Res Lang Soc Interact 27:201–221
Brand ME (1999) Voice puppetry. In: Proceedings of ACM SIGGRAPH, Los Angeles, 13 August 1999, pp 21–28
Breen AP, Bowers E, Welsh W (1996) An investigation into the generation of mouth shapes for a talking head. In: Proceedings of the international conference on spoken language processing (ICSLP), Philadelphia, 3–6 October 1996, pp 108–111
Bregler C, Covell M, Slaney M (1997) Video rewrite: driving visual speech with audio. In: Proceedings of ACM SIGGRAPH, Los Angeles, 3–8 August 1997, pp 353–360
Cassell J, Vilhjalmsson H, Bickmore T (2001) BEAT: the Behavior Expression Animation Toolkit. In: Proceedings of ACM SIGGRAPH Los Angeles, 12–17 August 2001, pp 477–486
Celniker G, Gossard D (1991) Deformable curve and surface finite-elements for freeform shape design. In: Proceedings of ACM SIGGRAPH, Las Vegas, NV, 28 July–2 August 1991, pp 257–265
Cohen MM, Massaro DW (1993) Modeling coarticulation in synthetic visual speech. In: Thalman NM, Thalman D (eds) Models and techniques in computer animation. Springer, Berlin Heidelberg New York, pp 139–156
Cohn JF, Zlochower A, Lien J, Wu YT, Kanade T (1997) Automated face coding: a computer-vision based method of facial expression analysis. In: Proceedings of the 7th European conference on facial expression, measurement, and meaning, Salzburg, Austria.,16–22 July 1997, pp 329–333
Cole R, Massaro DW, de Villiers J, Rundle B, Shobaki K, Wouters J, Cohen M, Beskow J, Stone P, Connors P, Tarachow A, Solcher D (1999) New tools for interactive speech and language training: using animated conversational agents in the classrooms of profoundly deaf children. In: Proceedings of the ESCA/SOCRRATES workshop on method and tool innovations for speech science education, University College, London, 16–17 April 1999, pp 45–52
Ekman P, Friesen W (1978) Facial action coding system. Consulting Psychologists Press, Palo Alto, CA
Engwall O (2000) A 3D tongue model based on MRI data. In: Proceedings of ICSLP, III, Beijing, 16 October 2000, pp 901–904
Eubank RL (1999) Nonparametric regression and spline smoothing. Marcel Dekker, New York
Ezzat T, Geiger G, Poggio T (2002) Trainable video realistic speech animation. In: Proceedings of ACM SIGGRAPH 2002, San Antonio, TX, 23–26 July 2002, pp 388–398
Farin G (2002) Curves and surfaces for CAGD, 5th edn. Academic, San Diego, pp 155–175
Guenter B, Grimm C, Wood D, Malvar H, Pighin F (1998) Making faces. In: Proceedings of ACM SIGGRAPH, Orlando, FL, 19–24 July 1998, pp 55–66
Jeffers J, Barley M (1971) Speechreading. Thomas, Springfield, IL
Kent RD, Minifie FD (1977) Coarticulation in recent speech production models. J Phonet 5:115–135
Kent RD (1997) The speech sciences. Singular, San Diego
King SA, Parent RE (2001) A 3D parametric tongue model for animated speech. J Vis Comput Animat 12(3):107–115
Kleiser J (1989) A fast, efficient, accurate way to represent the human face: state of the art in facial animation.In: Proceedings of ACM SIGGRAPH, Tutorials, Boston, 31 July–4 August 1989, 22:20–33
Koch RM, Gross MH, Carls FR, von Büren DF, Fankhauser G, Parish YIH (1996) Simulating facial surgery using finite element models. In: Proceedings of ACM SIGGRAPH, New Orleans, 4–9 August 1996, pp 421–428
Kouadio C, Poulin P, Lachapelle P (1998) Real time facial animation based upon a bank of 3D facial expressions. In: Proceedings of Computer Animation ’98, Philadelphia, June 1998, pp 128–136
Kshirsagar S, Magnenat-Thalmann N (2000) Lip synchronization using linear predictive analysis. In Proceedings of the IEEE international conference on multimedia and expo (II), New York, 30 July–2 August 2000, pp 1077–1080
Kshirsagar S, Molet T, Magnenat-Thalmann N (2001) Principal components of expressive speech animation. In: Proceedings of Computer Graphics International, Hong Kong, 3 June–6 July 2001, pp 38–44
Lee Y, Terzopoulos D, Waters K (1995) Realistic modeling for facial animation. In: Proceedings of ACM SIGGRAPH’95, Los Angeles, August 1995, pp 55–62
Löfqvist, A (1990) Speech as audible gestures. In: Hardcastle WJ, Marchal A (eds) Speech production and speech modelling. Kluwer, Dordrecht, pp 289–322
Maestri G (1996) Digital character animation.New Riders, Indianapolis
Moccozet L, Magnenat Thalmann N (1997) Dirichlet free-form deformations and their application to hand simulation. In: Proceedings of the IEEE international conference on computer animation, Geneva, 5–6 June 1997, pp 93–102
Magnenat Thalmann N, Primeau E, Thalmann D (1988) Abstract muscle action procedures for human face animation. Vis Comput 3(5):290–297
Massaro DW (1996) Perceiving talking faces: from speech perception to a behavioral principle. MIT Press, Cambridge, MA
Ma JY, Yan J, Cole R (2002) CU animate: tools for enabling conversions with animated characters. In: Proceedings of the international conference on spoken language processing (ICSLP), Denver, CO, 16–20 September 2002, 1:197–200
McNeill D (1992) Hand and mind: what gestures reveal about thought. University of Chicago Press, Chicago
Noh JY, Neumann U (2001) Expression cloning. In: Proceedings of ACM SIGGRAPH, Los Angeles, August 2001, pp 277–288
Öhman SEG (1966) Coarticulation in VCV utterances: spectrographic measurements. J Acoust Soc Am 39:151–168
Pandzic IS, Forchheimer R (2002) MPEG-4 facial animation: the standard, implementation and applications. Wiley, New York
Parke F (1972) Computer generated animation of face. In: Proceedings of the ACM national conference, Boston, 1 August 1972, pp 451–457
Pighin F, Szeliski R, Salesin D (2002) Modeling and animating realistic faces from images. Int J Comput Vision 50(2):143–1698
Pelachaud C, Badler N, Steedman M (1991) Linguistic issues in facial animation. In: Magnenat-Thalmann N, Thalmann D (eds) Proceedings of Computer Animation, Springer, Berlin Heidelberg New York, 1 June 1991, pp 15–30
Pellom B, Hacioglu K (2003) Recent improvements in the SONIC ASR system for noisy speech: the SPINE task. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), Hong Kong, 6–10 April 2003, 1:4–7
Platt SM, Badler NI (1981) Animating facial expressions. ACM Comput Graph 15(3):245–252
Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numerical recipes in C. Cambridge University Press, Cambridge, UK
Sanguineti V, Laboissiere R, Payan Y (1997) A control model of human tongue movements in speech. Biol Cybern 77:11–22
Sclaroff S, Pentland A (1995) Modal matching for corrispondence and recognition. IEEE Trans Patt Anal Mach Intell 17(6):545–561
Small LH (1999) Fundamentals of phonetics: a practical guide for students. Allyn & Bacon, Boston
Stone M, Lundberg A (1996) Three-dimensional tongue surface shapes of English consonants and vowels. J Acoust Soc Am 99(6):3728–3737
Terzopoulos D, Waters K (1990) Physically-based facial modeling, analysis, and animation. J Vis Comput Animat 1(4):73–80
Vetter T, Poggio T (1995) Linear object classes and image synthesis from a single example image. IEEE Trans Patt Anal Mach Intell 19(7):733–742
Walther EF (1982) Lipreading. Nelson-Hall, Chicago
Hardcastle WJ, Hewlett N (1999) Coarticulation: theory, data and techniques. Cambridge University Press, Cambridge, UK
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ma, J., Cole, R. Animating visible speech and facial expressions. Visual Comp 20, 86–105 (2004). https://doi.org/10.1007/s00371-003-0234-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-003-0234-y