Towards Videorealistic Synthetic Visual Speech

Theobald, Barry; Bangham, J. Andrew; Kruse, Silko; Cawley, Gavin; Matthews, Iain

doi:10.1007/978-1-4615-0813-7_15

Barry Theobald²,
J. Andrew Bangham²,
Silko Kruse²,
Gavin Cawley² &
…
Iain Matthews³

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 704))

Abstract

In this paper we present preliminary results of work towards a videorealistic visual speech synthesiser. A generative model is used to track the face of a talker uttering a series of training sentences and an inventory of synthesis units is built by representing the trajectory of the model parameters with spline curves. A set of model parameters corresponding to a new utterance is formed by concatenating spline segments corresponding to synthesis units in the inventory and sampling at the original frame rate. The new parameters are applied to the model to create a sequence of images corresponding to the talking face.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arslan, L. and Talkin, D. (1998). Speech driven 3-d face point trajectory synthesis algorithm. In Proceedings of the Internation Conference on Speech and Language Processing (ICSLP).
Google Scholar
Baker, S., Dellaert, F., and Matthews, I. (2001). Aligning images incrementally backwards. Technical Report CMU-RI-TR-01-03, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
Google Scholar
Bregler, C., Covell, M., and Slaney, M. (1997). Video rewrite: driving visual speech with audio. In Proceedings of SIGGRAPH, pages 353–360.
Chapter Google Scholar
Brooke, N. and Scott, S. (1998). Two- and three-dimensional audio-visual speech synthesis. In Proceedings of Auditory-Visual Speech Processing, pages 213–218.
Google Scholar
Cohen, M. and Massaro, D. (1994). Modeling coarticualtion in synthetic visual speech. In Thalmann, N. and D. T., editors, Models and Techniques in Computer Animation, pages 141–155. Springer-Verlag.
Google Scholar
Cootes, T., Edwards, G., and Taylor, C. (1998). Active appearance models. In Burkhardt, H. and Neumann, B., editors, Proceedings of the European Conference on Computer Vision, volume 2, pages 484–498. Springer-Verlag.
Google Scholar
Cosatto, E. and Graf, H. (1998). Sample-based synthesis of photo-realistic talking heads. In Proceedings of Computer Animation, pages 103–110.
Google Scholar
de Boor, C. (2001). Calculation of the smoothing spline with weighted roughness measure. Mathematical Models and Methods in Applied Sciences, 11(1): 33–41.
Article MathSciNet MATH Google Scholar
Ezzat, T. and Poggio, T. (1997). Videorealistic talking faces: A morphing approach. In Proceedings of the Audiovisual Speech Processing Workshop, Rhodes, Greece.
Google Scholar
Guiard-Marigny, T., Tsingos, N., Adjoudani, A., Benoit, C., and Gascuel, M. (1996). 3d models of the lips for realistic speech animation. In Computer Graphic 96.
Google Scholar
Hallgren, A. and Lyberg, B. (1998). Visual speech synthesis with concatenative speech. In Proceedings of Auditory-Visual Speech Processing, pages 181–183.
Google Scholar
Le Goff, B. and Benoit, C. (1996). A text-to-audiovisual-speech synthesizer for french. In Proceedings of the International Conference on Speech and Language Processing (ICSLP), Philadelphia, USA.
Google Scholar
Lee, Y., Terzopoulos, D., and Waters, K. (1993). Constructing physics-based facial models of individuals. In Proceedings of Graphics Interface, pages 1–8.
Google Scholar
Massaro, D. (1998). Perceiving Talking Faces. The MIT Press.
Google Scholar
Parke, F. (1974). A Parametric Model for Human Faces. PhD thesis, University of Utah, Salt Lake City, Utah, USA.
Google Scholar
Parke, F. and Waters, K. (1996). Computer Facial Animation. A K Peters.
Google Scholar
Platt, S. and Badler, N. (1981). Animating facial expression. Computer Graphics, 15(3):245–252.
Article Google Scholar
Waters, K. (1987). A muscle model for animating three-dimensional facial expressions. Proceeding of ACM SIGGRAPH, 21(4).17–24.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of East Anglia, Norwich, NR4 7TJ, UK
Barry Theobald, J. Andrew Bangham, Silko Kruse & Gavin Cawley
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Iain Matthews

Authors

Barry Theobald
View author publications
You can also search for this author in PubMed Google Scholar
J. Andrew Bangham
View author publications
You can also search for this author in PubMed Google Scholar
Silko Kruse
View author publications
You can also search for this author in PubMed Google Scholar
Gavin Cawley
View author publications
You can also search for this author in PubMed Google Scholar
Iain Matthews
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Sheffield, UK
Joab Winkler & Mahesan Niranjan &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Theobald, B., Bangham, J.A., Kruse, S., Cawley, G., Matthews, I. (2002). Towards Videorealistic Synthetic Visual Speech. In: Winkler, J., Niranjan, M. (eds) Uncertainty in Geometric Computations. The Springer International Series in Engineering and Computer Science, vol 704. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0813-7_15

Download citation

DOI: https://doi.org/10.1007/978-1-4615-0813-7_15
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5252-5
Online ISBN: 978-1-4615-0813-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics