Skip to main content

Continuous Automatic Speech Recognition by Lipreading

  • Chapter
Motion-Based Recognition

Part of the book series: Computational Imaging and Vision ((CIVI,volume 9))

Abstract

An automatic speechreading recognizer uses information about motions produced by the oral-cavity regions1 of a speaker uttering a sentence. The ability to automatically ‘lipread’ a speaker using a sequence of image frames is an example of motion-based recognition.

This work has no reference to Mitre past or present.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Michael Andeberg. Cluster Analysis for Applications. Academic Press, New York, NY, 1973.

    Google Scholar 

  2. Christoph Bregler, Hermann fluid, Stefan Manke, and Alex Waibel. Improving connected letter recognition by lipreading. In International Joint Conference on Speech and Signal Processing, volume 1, pages 557–560. IEEE, April 1993.

    Chapter  Google Scholar 

  3. Christoph Bregler, Stephen Omohundro, and Yochai Konig. A hybrid approach to bimodal speech recognition. In 28th Annual Asilomar Conference on Signals, Systems, and Computers. IEEE, October 1994.

    Google Scholar 

  4. N. Michael Brooke and Eric D. Petajan. Seeing speech: Investigations into the synthesis and recognition of visible speech movements using automatic image processing and computer graphics. In Proceedings of the International Conference on Speech Input/Output: Techniques and Applications, pages 104–109, London, 1986.

    Google Scholar 

  5. J. Burchett. Lipreading: A Handbook of Visible Speech. The Royal National Institute for the Deaf, London, England, 1965.

    Google Scholar 

  6. Roberta Cerio. Personal communications, February 1989.

    Google Scholar 

  7. Greg Chiou and Jenq-Neng Hwang. Lipreading from color motion video. In International Conference on Acoustics, Speech, and Signal Processing, pages 2156–2159. IEEE, May 1996.

    Google Scholar 

  8. Michael Cohen and Dominic Massaro. What can visual speech synthesis tell visible speech recognition. In Proceedings of the 28th Asilomar Conference on Signals, Systems, and Computers, pages 566–571. IEEE, October 1994.

    Chapter  Google Scholar 

  9. Orin Cornett. Personal communications, February 1989.

    Google Scholar 

  10. L. Erman and V. Lesser. The Hearsay-II speech understanding system: A tutorial. In A. Waibel and K. Lee, editors, Readings in Speech Recognition, pages 235–245. Morgan Kaufmann Publishers, 1990.

    Google Scholar 

  11. Kathleen Finn. An Investigation of Visible Lip Information to be used in Automatic Speech Recognition. PhD thesis, Georgetown University, Washington, DC, 1986.

    Google Scholar 

  12. C. G. Fisher. Confusions among visually perceived consonants. Journal of Speech and Hearing Research, 11: 796–804, 1968.

    Google Scholar 

  13. Oscar Garcia, Alan Goldschen, and Eric Petajan. Feature extraction for optical automatic speech recognition or automatic lipreading. Technical Report GWUIIST-92–32, The George Washington University, November 1992. Department of Electrical Engineering and Computer Science.

    Google Scholar 

  14. Alan Goldschen. Continuous Automatic Speech Recognition by Lipreading. PhD thesis, The George Washington University, Washington, DC, 1993.

    Google Scholar 

  15. Alan Goldschen, Oscar Garcia, and Eric Petajan. Continuous optical automatic speech recognition. In Proceedings of the 28th Asilomar Conference on Signals, Systems, and Computers, pages 572–577. IEEE, October 1994.

    Chapter  Google Scholar 

  16. Alan Goldschen, Oscar Garcia, and Eric Petajan. Rationale for phoneme-viseme mapping and feature selection in visual speech recognition. In David Stork, editor, Speechreading by Man and Machine: Models, Systems, and Applications, NATO Advanced Study Institute. Springer-Verlag, (in press).

    Google Scholar 

  17. Elizabeth Hazard. Lipreading: For the Oral Deaf and Hard-of-Hearing Person. Charles C. Thomas, Springfield, Illinois, 1971.

    Google Scholar 

  18. Marcus Hennecke, K. Prasad, and David Stork. Using deformable templates to infer visual speech dynamics. In 28th Annual Asilornar Conference on Signals, Systems, and Computers. IEEE, October 1994.

    Google Scholar 

  19. Frederick Jelinek. Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64: 532–556, 1976.

    Article  Google Scholar 

  20. Frederick Jelinek. Self-organized continuous speech recognition. ln Jean-Paul Haton, editor, Automatic Speech and Analysis Recognition, pages 231–238. Reidel Publishing Company, 1982.

    Google Scholar 

  21. Jr. John Deller, John Proakis,, and John Hansen. Discrete-Time Processing of Speech Signals. Macmillan Publishing Company, New York, NY, 1993.

    Google Scholar 

  22. I.T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, NY, 1986.

    Google Scholar 

  23. Biing Juang and Lawrence Rabiner. A probabilistic distance measure for hidden markov models. ATT Technical Journal, 64 (2): 391–408, February 1985.

    MathSciNet  Google Scholar 

  24. Kai Fu Lee. Automatic Speech Recognition: The Development of the Sphinx System. PhD thesis, Carnegie-Mellon University, Pittsburgh, PA 15213, 1989.

    Google Scholar 

  25. Stephen Levinson, Lawrence Rabiner, and Man Mohan Sondhi. An introduction to the application of theory of probabilistic function of a markov process to automatic speech recognition. The Bell System Technical Journal, 62 (4): 1035–1074, April 1983.

    MathSciNet  MATH  Google Scholar 

  26. Juergen Luettin, Neil Thacker, and Steve Beet. Speech reading using shape and intensity information. In International Conference on Spoken Language Processing, pages 58–61. IEEE, October 1996.

    Google Scholar 

  27. Juergen Luettin, Neil Thacker, and Steve Beet. Visual speech recognition using active shape models and hidden markov models. In International Conference on Acoustics, Speech, and Signal Processing, pages 817–820. IEEE, May 1996.

    Google Scholar 

  28. M.W. Mak and W.G. Allen. Lip-motion analysis for speech segmentation in noise. Speech Communication, 14: 279–296, 1994.

    Article  Google Scholar 

  29. Glenn Martin and Mubarak Shah. Lipreading using optical flow. In Proceedings National Conference on Undergraduate Research, March 1995.

    Google Scholar 

  30. Kenji Mase and Alex Pentland. Automatic lipreading by optical flow analysis. Systems and Computer in Japan, 22 (6): 67–76, 1991.

    Google Scholar 

  31. Iain Matthews, J Bangham, and Stephen Cox. Audiovisual speech recognition using multiscale nonlinear image decomposition. In International Conference on Spoken Language Processing, pages 38–41. IEEE, October 1996.

    Google Scholar 

  32. Harry McGurk and John MacDonald. Hearing lips and seeing voices. Nature, 264:746–748, December 23 /30 1976.

    Google Scholar 

  33. Uwe Meier, Wolfgang Hurst, and Paul Duchnowski. Adaptive bimodal sensor fusion for automatic lipreading. In International Conference on Acoustics, Speech, and Signal Processing, pages 833–836. IEEE, May 1996.

    Google Scholar 

  34. Allen Montgomery and Pamela Jackson. Physical characteristics of the lips underlying vowel lipreading performance. Journal of Acoustical Society of America, 73 (6): 2134–2144, June 1983.

    Article  Google Scholar 

  35. Nishida. Speech recognition enhancement by lip information. ACM SIGCHI Bulletin, 17 (4): 198–204, April 1986.

    Article  Google Scholar 

  36. NIST, Gaithersburg, MD 20899. DARPA TIMIT CD-ROM, November 1988.

    Google Scholar 

  37. Catherine Pelachaud, Norman Badler, and Marie-Luce Viaud. Final report to NSF of the standards for facial animation workshop. Technical report, University of Pennsylvania, Philadelphia, PA, October 1994.

    Google Scholar 

  38. Alex Pentland and Kenji Mase. Lip reading: Automatic visual recognition of spoken words. Technical Report MIT Media Lab Vision Science Technical Report117, Massachusetts Institute of Technology, January 15 1989.

    Google Scholar 

  39. Eric Petajan. Automatic Lipreading to Enhance Speech Recognition. PhD thesis, University of Illinois at Urbana-Champaign, 1984.

    Google Scholar 

  40. Eric Petajan. Automatic lipreading to enhance speech recognition. In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition,pages 40–47, San Francisco, CA, 1985. IEEE.

    Google Scholar 

  41. Eric Petajan, Bradford Bischoff, David Bodoff, and N. Michael Brooke. An improved automatic lipreading system to enhance speech recognition. In CHI-88, pages 19–25. ACM, 1988.

    Google Scholar 

  42. Gordan Peterson and Harold Barney. Control methods used in a study of the vowels. Journal of Acoustical Society of American, 24: 175–184, March 1952.

    Article  Google Scholar 

  43. Lawrence Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Alex Waibel and Kai-Fu Lee, editors, Readings in Speech Recognition, pages 267–296. Morgan Kaufmann Publishers, Inc., 1990.

    Google Scholar 

  44. Lawrence Rabiner and Bing-Hwang Juang. Fundamentals of Speech Recognition. Prentice-Hall, 1993.

    Google Scholar 

  45. Peter Silsbee. Computer Lipreading for Improved Accuracy in Automatic Speech Recognition. PhD thesis, The University of Texas at Austin, 1993.

    Google Scholar 

  46. Steve Smith. Computer lip reading to augment automatic speech recognition. Speech Tech, pages 175–181, 1989.

    Google Scholar 

  47. David Stork, Greg Wolff, and Earl Levine. Neural network lipreading system for improved speech recognition. International Joint Conference of Neural Networks, 1992.

    Google Scholar 

  48. Quentin Summerfield. Some preliminaries to a comprehensive account of audiovisual speech perception. In Barbara Dodd and Ruth Campbell, editors, Hearing by Eye: The Psychology of Lipreading, pages 3–51. Lawrence Earlbaum Associated, 1987.

    Google Scholar 

  49. Henry Tobin. Personal communications, February 1989.

    Google Scholar 

  50. M. Tomlinson, M. Russell, and N. Brooke. Integrating audio and visual information to provide highly robust speech recognition. In International Conference on Acoustics, Speech, and Signal Processing, pages 821–824. IEEE, May 1996.

    Google Scholar 

  51. Brian Walden, Robert Prosek, Allen Montgomery, Charlene Scherr, and Carla Jones. Effects of training on the visual recognition of consonant. Journal of Speech and Hearing Research, 20: 130–145, 1977.

    Google Scholar 

  52. Gill Waters. Speech production and perception. In Chris Rowden, editor, Speech Processing, pages 1–33. McGraw-Hill International, 1992.

    Google Scholar 

  53. Jian-Tong Wu, Shinichi Tamura, Hiroshi Mitsumoto, Hideo Kawai, Kenji Kurosu, and Kozo Okazaki. Neural network vowel-recognition jointly using voice features and mouth shape image. Pattern Recognition, 24 (10): 921–927, 1991.

    Article  Google Scholar 

  54. Ben Yuhas, Moise Goldstein, and Terrence Sejnowski. Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine, pages 65–71, 1989.

    Google Scholar 

  55. A. Yuille, P. Hallinan, and D. Cohen. Snakes: Active contour models. International Journal on Computer Vision, 8: 99–112, 1992.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Goldschen, A.J., Garcia, O.N., Petajan, E.D. (1997). Continuous Automatic Speech Recognition by Lipreading. In: Shah, M., Jain, R. (eds) Motion-Based Recognition. Computational Imaging and Vision, vol 9. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-8935-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-94-015-8935-2_14

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-4870-7

  • Online ISBN: 978-94-015-8935-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics