Skip to main content

Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation

  • Conference paper
Advances in Artificial Intelligence - SBIA 2008 (SBIA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5249))

Included in the following conference series:

Abstract

In this paper, the inversion of a joint Audio-Visual Hidden Markov Model is proposed to estimate the visual information from speech data in a speech driven MPEG-4 compliant facial animation system. The inversion algorithm is derived for the general case of considering full covariance matrices for the audio-visual observations. The system performance is evaluated for the cases of full and diagonal covariance matrices. Experimental results show that full covariance matrices are preferable since similar, to the case of using diagonal matrices, performance can be achieved using a less complex model. The experiments are carried out using audio-visual databases compiled by the authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yamamoto, E., Nakamura, S., Shikano, K.: Lip movement synthesis from speech based on Hidden Markov Models. Speech Communication 26(1-2), 105–115 (1998)

    Article  Google Scholar 

  2. Rao, R., Chen, T., Mersereau, R.: Audio-to-visual conversion for multimedia communication. IEEE Trans. on Industrial Electronics 45(1), 15–22 (1998)

    Article  Google Scholar 

  3. Chen, T.: Audiovisual speech processing. IEEE Signal Processing Magazine 18(1), 9–21 (2001)

    Article  Google Scholar 

  4. Brand, M.: Voice puppetry. In: Proceedings of SIGGRAPH, Los Angeles, CA USA, pp. 21–28 (August 1999)

    Google Scholar 

  5. Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. on Information Theories 13, 260–269 (1967)

    Article  MATH  Google Scholar 

  6. Choi, K., Luo, Y., Hwang, J.: Hidden Markov Model inversion for audio-to-visual conversion in an MPEG-4 facial animation system. Journal of VLSI Signal Processing 29(1-2), 51–61 (2001)

    Article  MATH  Google Scholar 

  7. Moon, S., Hwang, J.: Noisy speech recognition using robust inversion of Hidden Markov Models. In: Proceedings of IEEE International Conf. Acoust., Speech, Signal Processing, pp. 145–148 (1995)

    Google Scholar 

  8. Fu, S., Gutierrez-Osuna, R., Esposito, A., Kakumanu, P., Garcia, O.: Audio/visual mapping with cross-modal Hidden Markov Models. IEEE Trans. on Multimedia 7(2), 243–252 (2005)

    Article  Google Scholar 

  9. Xie, L., Liu, Z.Q.: A coupled HMM approach to video-realistic speech animation. Pattern Recognition 40, 2325–2340 (2007)

    Article  MATH  Google Scholar 

  10. ISO/IEC IS 14496-2, Visual (1999)

    Google Scholar 

  11. Baum, L.E., Sell, G.R.: Growth functions for transformations on manifolds. Pacific Journal of Mathematics 27(2), 211–227 (1968)

    Article  MathSciNet  MATH  Google Scholar 

  12. Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Inc., New York (2001)

    Book  Google Scholar 

  13. Gävert, H., Hurri, J., Särelä, J., Hyvärinen, A.: FastICA package for MATLAB. Lab. of Computer and Information Science, Helsinki University of Technology

    Google Scholar 

  14. Terissi, L.D., Gómez, J.C.: Facial motion tracking and animation: An ICA-based approach. In: Proceedings of 15th European Signal Processing Conference, Poznań, Poland, September 3-7, pp. 292–296 (2007)

    Google Scholar 

  15. Ostermann, J.: Face Animation in MPEG-4. In: MPEG-4 Facial Animation - The Standard, Implementation and Applications, pp. 17–56. John Wiley & Sons, Chichester (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Terissi, L.D., Gómez, J.C. (2008). Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation. In: Zaverucha, G., da Costa, A.L. (eds) Advances in Artificial Intelligence - SBIA 2008. SBIA 2008. Lecture Notes in Computer Science(), vol 5249. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88190-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88190-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88189-6

  • Online ISBN: 978-3-540-88190-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics