Skip to main content

Learning Fuzzy Rules for Visual Speech Recognition

  • Conference paper
Adaptive Multimedia Retrieval (AMR 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3094))

Included in the following conference series:

  • 242 Accesses

Abstract

We outline a method to learn fuzzy rules for visual speech recognition. Such a system could be used in automatic annotation of video sequences, to aid subsequent retrieval; it could also be used to improve the recognition of voice commands when a system has no keyboard. In the implemented system, features were extracted automatically from short video sequences, by identifying regions of the face and tracking the movement of various points around the mouth from frame to frame. The words in video sequences were segmented manually on phoneme boundaries and a rule base was constructed using two-dimensional fuzzy sets on feature and time parameters. The method was applied to the Tulips1 database and results were slightly better than those obtained with techniques based on neural networks and Hidden Markov Models. This suggests that the learned rules are speaker independent. A medium sized vocabulary of around 300 words, representative of phonemes in the English language, was created and used for training and testing. Reasonable accuracy for phoneme classification was achieved. Because of the ambiguity and similarity of various speech sounds a scheme was developed to select a group of words when a test word was presented to the system. The accuracy achieved was 21-33%, comparable to expert human lip-readers whose accuracy on nonsense words is about 30%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Neti, C., et al.: Final Workshop 2000 Report. In: Proc Audio-Visual Speech Recognition. Center for Language and Speech Processing, John Hopkins University (2000)

    Google Scholar 

  2. Haggerty, A., White, R.W., Jose, J.M.: NewsFlash: Adaptive TV News Delivery on the Web. In: Proc First International Workshop on Adaptive Multimedia Retrieval, pp. 33–47. Springer, Hamburg (2003)

    Google Scholar 

  3. Chau, M.D., Summerfield, C.D.: Auditory models as front-ends for speech recognition in high noise environments. In: Proc. Speech science and technology, Brisbane; Australia: Canberra, pp. 625–628 (1992)

    Google Scholar 

  4. Petajan, E.: Approaches to Visual Speech Processing based on the MPEG-4 Face Animation Standard. In: Proc. International conference on multimedia and Expo., pp. 575–578. IEEE, New York (2000)

    Google Scholar 

  5. Goldschen, A.J., Garcia, O.N., Petajan, E.D.: Continuous Automatic Speech Recognition by Lipreading, in Motion-Based Recognition. In: Shah, M., Jain, R. (eds.) Computational Imaging and Vision, vol. 9, pp. 321–344. Kluwer Academic Publishers, Dordrecht (1997)

    Google Scholar 

  6. Silsbee, P.L.: Computer Lipreading for Improved Accuracy in Automatic Speech Recognition, PhD Thesis, University of Texas (1993)

    Google Scholar 

  7. Wolff, G.J., Venkastech Prasad, K., Stork, D.G., Hennecke, M.: Lipreading by Neural Networks: Visual Preprocessing, Learning, and Sensory Integration Advances in Neural Information Processing Systems p.1027 (1994)

    Google Scholar 

  8. Jeffers, J., Barley, M.: Speechreading (Lipreading). Charles C Thomas, Springfield, IL, USA (1971)

    Google Scholar 

  9. Case, S.J., Baldwin, J.F., Martin, T.P.: Machine Interpretation of Facial Expressions, in Intelligent Systems and Soft Computing. In: Azvine, B., Nauck, D.D., Azarmi, N. (eds.) Intelligent Systems and Soft Computing. LNCS, vol. 1804, pp. 321–342. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  10. Saeed, M.: Soft AI Methods and Visual Speech Recognition, PhD Thesis, University of Bristol (1999)

    Google Scholar 

  11. Zadeh, L.A.: Fuzzy Logic = Computing with Words. IEEE Transactions on Fuzzy Systems 4, 103–111 (1996)

    Article  Google Scholar 

  12. Baldwin, J.F.: Mass Assignments and Fuzzy Sets for Fuzzy Databases. In: Fedrizzi, M., Kacprzyk, J., Yager, R.R. (eds.) Advances in the Shafer Dempster Theory of Evidence, John Wiley, Chichester (1994)

    Google Scholar 

  13. Baldwin, J.F.: The Management of Fuzzy and Probabilistic Uncertainties for Knowledge Based Systems. In: Shapiro, S.A. (ed.) Encyclopedia of AI, pp. 528–537. John Wiley, Chichester (1992)

    Google Scholar 

  14. Baldwin, J.F., Martin, T.P., Pilsworth, B.W.: FRIL - Fuzzy and Evidential Reasoning in AI. Research Studies Press (John Wiley), U.K (1995)

    Google Scholar 

  15. Movellan, J.: Visual Speech Recognition With Stochastic Networks. Advances in Neural Information Processing Systems, 851-858 (1995)

    Google Scholar 

  16. Baldwin, J.F., Martin, T.P., Saeed, M.: Automatic computer lip-reading using fuzzy set theory. In: Proc Auditory-visual speech processing; Proceedings of AVSP 1999, Santa Cruz, CA, pp. 86–91 (1999)

    Google Scholar 

  17. Movellan, J.R., Mineiro, P.: A diffusion network approach to visual speech recognition. In: Proc Auditory-visual speech processing; Proceedings of AVSP 1999. Santa Cruz, CA, pp. 92-96 (1999)

    Google Scholar 

  18. Luettin, J., Thacker, N.A.: Speechreading using Probabilistic Models. In: Metaxas, D., Terzopoulos, D. (eds.) Physics-Based Modeling and Reasoning in Computer Vision. Computer Vision and Image Understanding, vol. 65(2), pp. 163–178. Academic Press Inc., London (1997)

    Google Scholar 

  19. Goldschen, A.J.: Continuous Automatic Speech Recognition by Lipreading, PhD Thesis, George Washington University (1993)

    Google Scholar 

  20. Henegar, M.E., Cornett, R.O.: Cued Speech Handbook for Parents. Gallaudet College, Kendal Green, Washington DC (1971)

    Google Scholar 

  21. Petajan, E.: Automatic Lipreading to Enhance Speech Recognition, PhD Thesis, University of Illinois at Urbana-Champaign (1984)

    Google Scholar 

  22. Finn, K.E., Montgomery, A.A.: Automatic optically-based recognition of speech. Pattern Recognition Letters 8, 159–164 (1988)

    Article  Google Scholar 

  23. Mase, K., Pentland, A.: Lip reading: autoamtic visual recognition of spoken words. In: Proc Image Understanding and Machine Vision: Optical Society of America (1989)

    Google Scholar 

  24. Yuhas, B.P., Goldstein, M.H., Sejnowski, T.J., Jenkins, R.E.: Neural network models of sensory integration for improved vowel recognition. Proc. IEEE 78, 1658–1668 (1990)

    Article  Google Scholar 

  25. Yuhas, B.P., Goldstein, M.H., Sejnowski, T.J.: Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine, 65–71 (1989)

    Google Scholar 

  26. Silsbee, P.L., Bovik, A.C.: Computer Lipreading for Improved Accuracy in Automatic Speech Recognition. Ieee Transactions on Speech and Audio Processing 4, 337–351 (1996)

    Article  Google Scholar 

  27. Luettin, J., Thacker, N.A., Beet, S.W.: Speechreading using Shape and Intensity Information. In: Proc International conference on spoken language processing, Philadelphia; PA: New York, pp. 58–61 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Anwar, M.A., Baldwin, J.F., Martin, T.P. (2004). Learning Fuzzy Rules for Visual Speech Recognition. In: Nürnberger, A., Detyniecki, M. (eds) Adaptive Multimedia Retrieval. AMR 2003. Lecture Notes in Computer Science, vol 3094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25981-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-25981-7_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22163-0

  • Online ISBN: 978-3-540-25981-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics