Viseme Classification for Talking Head Application

Leszczynski, Mariusz; Skarbek, Władysław

doi:10.1007/11556121_95

Mariusz Leszczynski¹⁸ &
Władysław Skarbek¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3691))

Included in the following conference series:

International Conference on Computer Analysis of Images and Patterns

830 Accesses
1 Citations

Abstract

Real time classification algorithms are presented for visual mouth appearances (visemes) which correspond to phonemes and their speech contexts. They are used at the design of talking head application. Two feature extraction procedures were verified. The first one is based on the normalized triangle mesh covering mouth area and the color image texture vector indexed by barycentric coordinates. The second procedure performs Discrete Fourier Transform on the image rectangle including mouth w.r.t. a small block of DFT coefficients. The classifier has been designed by the optimized LDA method which uses two singular subspace approach. Despite of higher computational complexity (about three milliseconds per video frame on Pentium IV 3.2GHz), the DFT+LDA approach has practical advantages over MESH+LDA classifier. Firstly, it is better in recognition rate more than two percent (97.2% versus 99.3%). Secondly, the automatic identification of the covering mouth rectangle is more robust than the automatic identification of the covering mouth triangle mesh.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bober, M., Kucharski, K., Skarbek, W.: Face recognition by fisher and scatter linear discriminant analysis. In: Petkov, N., Westenberg, M.A. (eds.) CAIP 2003. LNCS, vol. 2756, pp. 638–645. Springer, Heidelberg (2003)
Chapter Google Scholar
Grocholewski, S.: CORPORA - Speech Database for Polish Diphones. In: 5th European Conference on Speech Communication and Technology EUROSPEECH 1997 Rhodes, Greece, September 22-25 (1997)
Google Scholar
Fukunaga, K.: Introduction to statistical pattern recognition, 2nd edn. Academic Press, Boston (1990)
MATH Google Scholar
Golub, G., Van Loan, C.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
MATH Google Scholar
Swets, D.L., Weng, J.: Using Discriminant Eigenfeatures for Image Retrieval. IEEE Trans. on PAMI 18(8), 831–837 (1996)
Google Scholar
The Hidden Markov Model Toolkit (HTK), http://htk.eng.cam.ac.uk

Download references

Author information

Authors and Affiliations

Faculty of Electronics and Information Technology, Warsaw University of Technology,
Mariusz Leszczynski & Władysław Skarbek

Authors

Mariusz Leszczynski
View author publications
You can also search for this author in PubMed Google Scholar
Władysław Skarbek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA-Rocquencourt, Domaine de Voluceau, BP105, 78153, Le Chesnay, France
André Gagalowicz
Ghent University, 9000, Gent, Belgium
Wilfried Philips

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leszczynski, M., Skarbek, W. (2005). Viseme Classification for Talking Head Application. In: Gagalowicz, A., Philips, W. (eds) Computer Analysis of Images and Patterns. CAIP 2005. Lecture Notes in Computer Science, vol 3691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11556121_95

Download citation

DOI: https://doi.org/10.1007/11556121_95
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28969-2
Online ISBN: 978-3-540-32011-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics