Skip to main content

A Proposal for a Visual Speech Animation System for European Portuguese

  • Conference paper
Advances in Speech and Language Technologies for Iberian Languages

Abstract

Visual speech animation, or lip synchronization, is the process of matching speech with the lip movements of a virtual character. It is a challenging task because all articulatory movements must be controlled and synchronized with the audio signal. Existing language-independent systems usually require fine tuning by an artist to avoid artefacts appearing in the animation. In this paper, we present a modular visual speech animation framework aimed at speeding up and easing the visual speech animation process as compared with traditional techniques. We demonstrate the potential of the framework by developing the first automatic visual speech automation system for European Portuguese based on the concatenation of visemes. We also present the results of a preliminary evaluation that was carried out to assess the quality of two different phoneme-to-viseme mappings devised for the language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ostendorf, M.: Moving beyond the ‘beads-on-a-string’ model of speech. In: Proceedings of IEEE ASRU 1999, Keystone, CO, USA (December 1999)

    Google Scholar 

  2. Cohen, M., Massaro, D.: Modeling coarticulation in synthetic visual speech. Models and Techniques in Computer, 139–156 (1993)

    Google Scholar 

  3. Bregler, C., Covell, M., Slaney, M.: Video Rewrite. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1997, New York, USA, pp. 353–360 (1997)

    Google Scholar 

  4. Zhou, Z., Zhao, G., Pietikäinen, M.: Synthesizing a talking mouth. In: Proc. of the 7th Indian Conf. on Computer Vision, Graphics and Image Processing, ICVGIP 2010, New York, USA, pp. 211–218 (2010)

    Google Scholar 

  5. Liu, K., Ostermann, J.: Optimization of an Image-Based Talking Head System. EURASIP Journal on Audio, Speech, and Music Processing, 1–13 (2009)

    Google Scholar 

  6. Gutierrez-Osuna, R., Kakumanu, P., Esposito, A., Garcia, O., Bojorquez, A., Castillo, J., Rudomin, I.: Speech-driven facial animation with realistic dynamics. IEEE Transactions on Multimedia 7(1), 33–42 (2005)

    Article  Google Scholar 

  7. Liu, J., You, M., Chen, C., Song, M.: Real-time speechdriven animation of expressive talking faces. International Journal of General Systems 40(4), 439–455 (2009)

    Article  MathSciNet  Google Scholar 

  8. Hofer, G., Yamagishi, J., Shimodaira, H.: Speech-driven Lip Motion Generation with a Trajectory HMM. In: Proc. of Interspeech, pp. 2314–2317 (2008)

    Google Scholar 

  9. Demartino, J., Pinimagalhaes, L., Violaro, F.: Facial Animation based on context-dependent visemes. Computers & Graphics 30(6) (2006)

    Google Scholar 

  10. Berger, M., Hofer, G.: Carnival - Combining Speech Technology and Computer Animation. IEEE Computer Graphics and Applications 31, 80–89 (2011)

    Article  Google Scholar 

  11. Sutton, S., Cole, R., Villiers, J., Schalkwyk, J., Vermeulen, P., Macon, M., Yan, Y., Kaiser, E., Rundle, B., Shobaki, K., Hosom, P., Kain, A., Wouters, J., Massaro, D., Cohen, M.: Universal Speech Tools: The CSLY toolkit. Language, 3221–3224 (1998)

    Google Scholar 

  12. Erber, N.: Auditory, Visual, and Auditory-visual Recognition of Consonants by Children with Normal and Impaired Hearing. Journal of Speech and Hearing Research 15, 413–422 (1972)

    Google Scholar 

  13. Microsoft Speech API (March 30, 2012)., http://msdn.microsoft.com/en-us/library/ee125663%28v=VS.85%29.aspx

  14. Verwey, J., Blake, E.: The Influence of Lip Animation on the Perception of Speech in Virtual Environments. In: Proc. of the 8th Annual International Workshop on Presence, University College London, pp. 163–170 (2005)

    Google Scholar 

  15. Autodesk Maya (March 30, 2012), http://usa.autodesk.com/maya/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Serra, J., Ribeiro, M., Freitas, J., Orvalho, V., Dias, M.S. (2012). A Proposal for a Visual Speech Animation System for European Portuguese. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35292-8_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35291-1

  • Online ISBN: 978-3-642-35292-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics