Skip to main content

Viseme Classification for Talking Head Application

  • Conference paper
Computer Analysis of Images and Patterns (CAIP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3691))

Included in the following conference series:

Abstract

Real time classification algorithms are presented for visual mouth appearances (visemes) which correspond to phonemes and their speech contexts. They are used at the design of talking head application. Two feature extraction procedures were verified. The first one is based on the normalized triangle mesh covering mouth area and the color image texture vector indexed by barycentric coordinates. The second procedure performs Discrete Fourier Transform on the image rectangle including mouth w.r.t. a small block of DFT coefficients. The classifier has been designed by the optimized LDA method which uses two singular subspace approach. Despite of higher computational complexity (about three milliseconds per video frame on Pentium IV 3.2GHz), the DFT+LDA approach has practical advantages over MESH+LDA classifier. Firstly, it is better in recognition rate more than two percent (97.2% versus 99.3%). Secondly, the automatic identification of the covering mouth rectangle is more robust than the automatic identification of the covering mouth triangle mesh.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bober, M., Kucharski, K., Skarbek, W.: Face recognition by fisher and scatter linear discriminant analysis. In: Petkov, N., Westenberg, M.A. (eds.) CAIP 2003. LNCS, vol. 2756, pp. 638–645. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  2. Grocholewski, S.: CORPORA - Speech Database for Polish Diphones. In: 5th European Conference on Speech Communication and Technology EUROSPEECH 1997 Rhodes, Greece, September 22-25 (1997)

    Google Scholar 

  3. Fukunaga, K.: Introduction to statistical pattern recognition, 2nd edn. Academic Press, Boston (1990)

    MATH  Google Scholar 

  4. Golub, G., Van Loan, C.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  5. Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  6. Swets, D.L., Weng, J.: Using Discriminant Eigenfeatures for Image Retrieval. IEEE Trans. on PAMI 18(8), 831–837 (1996)

    Google Scholar 

  7. The Hidden Markov Model Toolkit (HTK), http://htk.eng.cam.ac.uk

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Leszczynski, M., Skarbek, W. (2005). Viseme Classification for Talking Head Application. In: Gagalowicz, A., Philips, W. (eds) Computer Analysis of Images and Patterns. CAIP 2005. Lecture Notes in Computer Science, vol 3691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11556121_95

Download citation

  • DOI: https://doi.org/10.1007/11556121_95

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28969-2

  • Online ISBN: 978-3-540-32011-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics