Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation

Terissi, Lucas D.; Gómez, Juan Carlos

doi:10.1007/978-3-540-88190-2_9

Lucas D. Terissi³ &
Juan Carlos Gómez³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5249))

Included in the following conference series:

Brazilian Symposium on Artificial Intelligence

1218 Accesses
4 Citations

Abstract

In this paper, the inversion of a joint Audio-Visual Hidden Markov Model is proposed to estimate the visual information from speech data in a speech driven MPEG-4 compliant facial animation system. The inversion algorithm is derived for the general case of considering full covariance matrices for the audio-visual observations. The system performance is evaluated for the cases of full and diagonal covariance matrices. Experimental results show that full covariance matrices are preferable since similar, to the case of using diagonal matrices, performance can be achieved using a less complex model. The experiments are carried out using audio-visual databases compiled by the authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yamamoto, E., Nakamura, S., Shikano, K.: Lip movement synthesis from speech based on Hidden Markov Models. Speech Communication 26(1-2), 105–115 (1998)
Article Google Scholar
Rao, R., Chen, T., Mersereau, R.: Audio-to-visual conversion for multimedia communication. IEEE Trans. on Industrial Electronics 45(1), 15–22 (1998)
Article Google Scholar
Chen, T.: Audiovisual speech processing. IEEE Signal Processing Magazine 18(1), 9–21 (2001)
Article Google Scholar
Brand, M.: Voice puppetry. In: Proceedings of SIGGRAPH, Los Angeles, CA USA, pp. 21–28 (August 1999)
Google Scholar
Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. on Information Theories 13, 260–269 (1967)
Article MATH Google Scholar
Choi, K., Luo, Y., Hwang, J.: Hidden Markov Model inversion for audio-to-visual conversion in an MPEG-4 facial animation system. Journal of VLSI Signal Processing 29(1-2), 51–61 (2001)
Article MATH Google Scholar
Moon, S., Hwang, J.: Noisy speech recognition using robust inversion of Hidden Markov Models. In: Proceedings of IEEE International Conf. Acoust., Speech, Signal Processing, pp. 145–148 (1995)
Google Scholar
Fu, S., Gutierrez-Osuna, R., Esposito, A., Kakumanu, P., Garcia, O.: Audio/visual mapping with cross-modal Hidden Markov Models. IEEE Trans. on Multimedia 7(2), 243–252 (2005)
Article Google Scholar
Xie, L., Liu, Z.Q.: A coupled HMM approach to video-realistic speech animation. Pattern Recognition 40, 2325–2340 (2007)
Article MATH Google Scholar
ISO/IEC IS 14496-2, Visual (1999)
Google Scholar
Baum, L.E., Sell, G.R.: Growth functions for transformations on manifolds. Pacific Journal of Mathematics 27(2), 211–227 (1968)
Article MathSciNet MATH Google Scholar
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John Wiley & Sons, Inc., New York (2001)
Book Google Scholar
Gävert, H., Hurri, J., Särelä, J., Hyvärinen, A.: FastICA package for MATLAB. Lab. of Computer and Information Science, Helsinki University of Technology
Google Scholar
Terissi, L.D., Gómez, J.C.: Facial motion tracking and animation: An ICA-based approach. In: Proceedings of 15th European Signal Processing Conference, Poznań, Poland, September 3-7, pp. 292–296 (2007)
Google Scholar
Ostermann, J.: Face Animation in MPEG-4. In: MPEG-4 Facial Animation - The Standard, Implementation and Applications, pp. 17–56. John Wiley & Sons, Chichester (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for System Dynamics and Signal Processing, FCEIA, Universidad Nacional de Rosario CIFASIS, CONICET, Riobamba 245bis, 2000, Rosario, Argentina
Lucas D. Terissi & Juan Carlos Gómez

Authors

Lucas D. Terissi
View author publications
You can also search for this author in PubMed Google Scholar
Juan Carlos Gómez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Systems Engineering and Computer Science - COPPE, Federal University of Rio de Janeiro (UFRJ), Brazil
Gerson Zaverucha
Department of Automation and Systems, Federal University of Santa Catarina, CEP 88.040-900, Brazil
Augusto Loureiro da Costa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Terissi, L.D., Gómez, J.C. (2008). Audio-to-Visual Conversion Via HMM Inversion for Speech-Driven Facial Animation. In: Zaverucha, G., da Costa, A.L. (eds) Advances in Artificial Intelligence - SBIA 2008. SBIA 2008. Lecture Notes in Computer Science(), vol 5249. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88190-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-88190-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88189-6
Online ISBN: 978-3-540-88190-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics