Learning Fuzzy Rules for Visual Speech Recognition

Anwar, M. A.; Baldwin, Jim F.; Martin, Trevor P.

doi:10.1007/978-3-540-25981-7_11

M. A. Anwar¹⁷,
Jim F. Baldwin¹⁷ &
Trevor P. Martin¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3094))

Included in the following conference series:

International Workshop on Adaptive Multimedia Retrieval

242 Accesses

Abstract

We outline a method to learn fuzzy rules for visual speech recognition. Such a system could be used in automatic annotation of video sequences, to aid subsequent retrieval; it could also be used to improve the recognition of voice commands when a system has no keyboard. In the implemented system, features were extracted automatically from short video sequences, by identifying regions of the face and tracking the movement of various points around the mouth from frame to frame. The words in video sequences were segmented manually on phoneme boundaries and a rule base was constructed using two-dimensional fuzzy sets on feature and time parameters. The method was applied to the Tulips1 database and results were slightly better than those obtained with techniques based on neural networks and Hidden Markov Models. This suggests that the learned rules are speaker independent. A medium sized vocabulary of around 300 words, representative of phonemes in the English language, was created and used for training and testing. Reasonable accuracy for phoneme classification was achieved. Because of the ambiguity and similarity of various speech sounds a scheme was developed to select a group of words when a test word was presented to the system. The accuracy achieved was 21-33%, comparable to expert human lip-readers whose accuracy on nonsense words is about 30%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Neti, C., et al.: Final Workshop 2000 Report. In: Proc Audio-Visual Speech Recognition. Center for Language and Speech Processing, John Hopkins University (2000)
Google Scholar
Haggerty, A., White, R.W., Jose, J.M.: NewsFlash: Adaptive TV News Delivery on the Web. In: Proc First International Workshop on Adaptive Multimedia Retrieval, pp. 33–47. Springer, Hamburg (2003)
Google Scholar
Chau, M.D., Summerfield, C.D.: Auditory models as front-ends for speech recognition in high noise environments. In: Proc. Speech science and technology, Brisbane; Australia: Canberra, pp. 625–628 (1992)
Google Scholar
Petajan, E.: Approaches to Visual Speech Processing based on the MPEG-4 Face Animation Standard. In: Proc. International conference on multimedia and Expo., pp. 575–578. IEEE, New York (2000)
Google Scholar
Goldschen, A.J., Garcia, O.N., Petajan, E.D.: Continuous Automatic Speech Recognition by Lipreading, in Motion-Based Recognition. In: Shah, M., Jain, R. (eds.) Computational Imaging and Vision, vol. 9, pp. 321–344. Kluwer Academic Publishers, Dordrecht (1997)
Google Scholar
Silsbee, P.L.: Computer Lipreading for Improved Accuracy in Automatic Speech Recognition, PhD Thesis, University of Texas (1993)
Google Scholar
Wolff, G.J., Venkastech Prasad, K., Stork, D.G., Hennecke, M.: Lipreading by Neural Networks: Visual Preprocessing, Learning, and Sensory Integration Advances in Neural Information Processing Systems p.1027 (1994)
Google Scholar
Jeffers, J., Barley, M.: Speechreading (Lipreading). Charles C Thomas, Springfield, IL, USA (1971)
Google Scholar
Case, S.J., Baldwin, J.F., Martin, T.P.: Machine Interpretation of Facial Expressions, in Intelligent Systems and Soft Computing. In: Azvine, B., Nauck, D.D., Azarmi, N. (eds.) Intelligent Systems and Soft Computing. LNCS, vol. 1804, pp. 321–342. Springer, Heidelberg (2000)
Chapter Google Scholar
Saeed, M.: Soft AI Methods and Visual Speech Recognition, PhD Thesis, University of Bristol (1999)
Google Scholar
Zadeh, L.A.: Fuzzy Logic = Computing with Words. IEEE Transactions on Fuzzy Systems 4, 103–111 (1996)
Article Google Scholar
Baldwin, J.F.: Mass Assignments and Fuzzy Sets for Fuzzy Databases. In: Fedrizzi, M., Kacprzyk, J., Yager, R.R. (eds.) Advances in the Shafer Dempster Theory of Evidence, John Wiley, Chichester (1994)
Google Scholar
Baldwin, J.F.: The Management of Fuzzy and Probabilistic Uncertainties for Knowledge Based Systems. In: Shapiro, S.A. (ed.) Encyclopedia of AI, pp. 528–537. John Wiley, Chichester (1992)
Google Scholar
Baldwin, J.F., Martin, T.P., Pilsworth, B.W.: FRIL - Fuzzy and Evidential Reasoning in AI. Research Studies Press (John Wiley), U.K (1995)
Google Scholar
Movellan, J.: Visual Speech Recognition With Stochastic Networks. Advances in Neural Information Processing Systems, 851-858 (1995)
Google Scholar
Baldwin, J.F., Martin, T.P., Saeed, M.: Automatic computer lip-reading using fuzzy set theory. In: Proc Auditory-visual speech processing; Proceedings of AVSP 1999, Santa Cruz, CA, pp. 86–91 (1999)
Google Scholar
Movellan, J.R., Mineiro, P.: A diffusion network approach to visual speech recognition. In: Proc Auditory-visual speech processing; Proceedings of AVSP 1999. Santa Cruz, CA, pp. 92-96 (1999)
Google Scholar
Luettin, J., Thacker, N.A.: Speechreading using Probabilistic Models. In: Metaxas, D., Terzopoulos, D. (eds.) Physics-Based Modeling and Reasoning in Computer Vision. Computer Vision and Image Understanding, vol. 65(2), pp. 163–178. Academic Press Inc., London (1997)
Google Scholar
Goldschen, A.J.: Continuous Automatic Speech Recognition by Lipreading, PhD Thesis, George Washington University (1993)
Google Scholar
Henegar, M.E., Cornett, R.O.: Cued Speech Handbook for Parents. Gallaudet College, Kendal Green, Washington DC (1971)
Google Scholar
Petajan, E.: Automatic Lipreading to Enhance Speech Recognition, PhD Thesis, University of Illinois at Urbana-Champaign (1984)
Google Scholar
Finn, K.E., Montgomery, A.A.: Automatic optically-based recognition of speech. Pattern Recognition Letters 8, 159–164 (1988)
Article Google Scholar
Mase, K., Pentland, A.: Lip reading: autoamtic visual recognition of spoken words. In: Proc Image Understanding and Machine Vision: Optical Society of America (1989)
Google Scholar
Yuhas, B.P., Goldstein, M.H., Sejnowski, T.J., Jenkins, R.E.: Neural network models of sensory integration for improved vowel recognition. Proc. IEEE 78, 1658–1668 (1990)
Article Google Scholar
Yuhas, B.P., Goldstein, M.H., Sejnowski, T.J.: Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine, 65–71 (1989)
Google Scholar
Silsbee, P.L., Bovik, A.C.: Computer Lipreading for Improved Accuracy in Automatic Speech Recognition. Ieee Transactions on Speech and Audio Processing 4, 337–351 (1996)
Article Google Scholar
Luettin, J., Thacker, N.A., Beet, S.W.: Speechreading using Shape and Intensity Information. In: Proc International conference on spoken language processing, Philadelphia; PA: New York, pp. 58–61 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

AI Group, Department of Engineering Mathematics, University of Bristol, UK
M. A. Anwar, Jim F. Baldwin & Trevor P. Martin

Authors

M. A. Anwar
View author publications
You can also search for this author in PubMed Google Scholar
Jim F. Baldwin
View author publications
You can also search for this author in PubMed Google Scholar
Trevor P. Martin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Knowledge and Language Engineering, Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, D-39106, Magdeburg, Germany
Andreas Nürnberger
Laboratoire d’Informatique de Paris 6,
Marcin Detyniecki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anwar, M.A., Baldwin, J.F., Martin, T.P. (2004). Learning Fuzzy Rules for Visual Speech Recognition. In: Nürnberger, A., Detyniecki, M. (eds) Adaptive Multimedia Retrieval. AMR 2003. Lecture Notes in Computer Science, vol 3094. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25981-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-25981-7_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22163-0
Online ISBN: 978-3-540-25981-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics