Continuous Automatic Speech Recognition by Lipreading

Goldschen, Alan J.; Garcia, Oscar N.; Petajan, Eric D.

doi:10.1007/978-94-015-8935-2_14

Alan J. Goldschen⁴,
Oscar N. Garcia⁵ &
Eric D. Petajan⁶

Part of the book series: Computational Imaging and Vision ((CIVI,volume 9))

330 Accesses
13 Citations

Abstract

An automatic speechreading recognizer uses information about motions produced by the oral-cavity regions¹ of a speaker uttering a sentence. The ability to automatically ‘lipread’ a speaker using a sequence of image frames is an example of motion-based recognition.

This work has no reference to Mitre past or present.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Michael Andeberg. Cluster Analysis for Applications. Academic Press, New York, NY, 1973.
Google Scholar
Christoph Bregler, Hermann fluid, Stefan Manke, and Alex Waibel. Improving connected letter recognition by lipreading. In International Joint Conference on Speech and Signal Processing, volume 1, pages 557–560. IEEE, April 1993.
Chapter Google Scholar
Christoph Bregler, Stephen Omohundro, and Yochai Konig. A hybrid approach to bimodal speech recognition. In 28th Annual Asilomar Conference on Signals, Systems, and Computers. IEEE, October 1994.
Google Scholar
N. Michael Brooke and Eric D. Petajan. Seeing speech: Investigations into the synthesis and recognition of visible speech movements using automatic image processing and computer graphics. In Proceedings of the International Conference on Speech Input/Output: Techniques and Applications, pages 104–109, London, 1986.
Google Scholar
J. Burchett. Lipreading: A Handbook of Visible Speech. The Royal National Institute for the Deaf, London, England, 1965.
Google Scholar
Roberta Cerio. Personal communications, February 1989.
Google Scholar
Greg Chiou and Jenq-Neng Hwang. Lipreading from color motion video. In International Conference on Acoustics, Speech, and Signal Processing, pages 2156–2159. IEEE, May 1996.
Google Scholar
Michael Cohen and Dominic Massaro. What can visual speech synthesis tell visible speech recognition. In Proceedings of the 28th Asilomar Conference on Signals, Systems, and Computers, pages 566–571. IEEE, October 1994.
Chapter Google Scholar
Orin Cornett. Personal communications, February 1989.
Google Scholar
L. Erman and V. Lesser. The Hearsay-II speech understanding system: A tutorial. In A. Waibel and K. Lee, editors, Readings in Speech Recognition, pages 235–245. Morgan Kaufmann Publishers, 1990.
Google Scholar
Kathleen Finn. An Investigation of Visible Lip Information to be used in Automatic Speech Recognition. PhD thesis, Georgetown University, Washington, DC, 1986.
Google Scholar
C. G. Fisher. Confusions among visually perceived consonants. Journal of Speech and Hearing Research, 11: 796–804, 1968.
Google Scholar
Oscar Garcia, Alan Goldschen, and Eric Petajan. Feature extraction for optical automatic speech recognition or automatic lipreading. Technical Report GWUIIST-92–32, The George Washington University, November 1992. Department of Electrical Engineering and Computer Science.
Google Scholar
Alan Goldschen. Continuous Automatic Speech Recognition by Lipreading. PhD thesis, The George Washington University, Washington, DC, 1993.
Google Scholar
Alan Goldschen, Oscar Garcia, and Eric Petajan. Continuous optical automatic speech recognition. In Proceedings of the 28th Asilomar Conference on Signals, Systems, and Computers, pages 572–577. IEEE, October 1994.
Chapter Google Scholar
Alan Goldschen, Oscar Garcia, and Eric Petajan. Rationale for phoneme-viseme mapping and feature selection in visual speech recognition. In David Stork, editor, Speechreading by Man and Machine: Models, Systems, and Applications, NATO Advanced Study Institute. Springer-Verlag, (in press).
Google Scholar
Elizabeth Hazard. Lipreading: For the Oral Deaf and Hard-of-Hearing Person. Charles C. Thomas, Springfield, Illinois, 1971.
Google Scholar
Marcus Hennecke, K. Prasad, and David Stork. Using deformable templates to infer visual speech dynamics. In 28th Annual Asilornar Conference on Signals, Systems, and Computers. IEEE, October 1994.
Google Scholar
Frederick Jelinek. Continuous speech recognition by statistical methods. Proceedings of the IEEE, 64: 532–556, 1976.
Article Google Scholar
Frederick Jelinek. Self-organized continuous speech recognition. ln Jean-Paul Haton, editor, Automatic Speech and Analysis Recognition, pages 231–238. Reidel Publishing Company, 1982.
Google Scholar
Jr. John Deller, John Proakis,, and John Hansen. Discrete-Time Processing of Speech Signals. Macmillan Publishing Company, New York, NY, 1993.
Google Scholar
I.T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, NY, 1986.
Google Scholar
Biing Juang and Lawrence Rabiner. A probabilistic distance measure for hidden markov models. ATT Technical Journal, 64 (2): 391–408, February 1985.
MathSciNet Google Scholar
Kai Fu Lee. Automatic Speech Recognition: The Development of the Sphinx System. PhD thesis, Carnegie-Mellon University, Pittsburgh, PA 15213, 1989.
Google Scholar
Stephen Levinson, Lawrence Rabiner, and Man Mohan Sondhi. An introduction to the application of theory of probabilistic function of a markov process to automatic speech recognition. The Bell System Technical Journal, 62 (4): 1035–1074, April 1983.
MathSciNet MATH Google Scholar
Juergen Luettin, Neil Thacker, and Steve Beet. Speech reading using shape and intensity information. In International Conference on Spoken Language Processing, pages 58–61. IEEE, October 1996.
Google Scholar
Juergen Luettin, Neil Thacker, and Steve Beet. Visual speech recognition using active shape models and hidden markov models. In International Conference on Acoustics, Speech, and Signal Processing, pages 817–820. IEEE, May 1996.
Google Scholar
M.W. Mak and W.G. Allen. Lip-motion analysis for speech segmentation in noise. Speech Communication, 14: 279–296, 1994.
Article Google Scholar
Glenn Martin and Mubarak Shah. Lipreading using optical flow. In Proceedings National Conference on Undergraduate Research, March 1995.
Google Scholar
Kenji Mase and Alex Pentland. Automatic lipreading by optical flow analysis. Systems and Computer in Japan, 22 (6): 67–76, 1991.
Google Scholar
Iain Matthews, J Bangham, and Stephen Cox. Audiovisual speech recognition using multiscale nonlinear image decomposition. In International Conference on Spoken Language Processing, pages 38–41. IEEE, October 1996.
Google Scholar
Harry McGurk and John MacDonald. Hearing lips and seeing voices. Nature, 264:746–748, December 23 /30 1976.
Google Scholar
Uwe Meier, Wolfgang Hurst, and Paul Duchnowski. Adaptive bimodal sensor fusion for automatic lipreading. In International Conference on Acoustics, Speech, and Signal Processing, pages 833–836. IEEE, May 1996.
Google Scholar
Allen Montgomery and Pamela Jackson. Physical characteristics of the lips underlying vowel lipreading performance. Journal of Acoustical Society of America, 73 (6): 2134–2144, June 1983.
Article Google Scholar
Nishida. Speech recognition enhancement by lip information. ACM SIGCHI Bulletin, 17 (4): 198–204, April 1986.
Article Google Scholar
NIST, Gaithersburg, MD 20899. DARPA TIMIT CD-ROM, November 1988.
Google Scholar
Catherine Pelachaud, Norman Badler, and Marie-Luce Viaud. Final report to NSF of the standards for facial animation workshop. Technical report, University of Pennsylvania, Philadelphia, PA, October 1994.
Google Scholar
Alex Pentland and Kenji Mase. Lip reading: Automatic visual recognition of spoken words. Technical Report MIT Media Lab Vision Science Technical Report117, Massachusetts Institute of Technology, January 15 1989.
Google Scholar
Eric Petajan. Automatic Lipreading to Enhance Speech Recognition. PhD thesis, University of Illinois at Urbana-Champaign, 1984.
Google Scholar
Eric Petajan. Automatic lipreading to enhance speech recognition. In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition,pages 40–47, San Francisco, CA, 1985. IEEE.
Google Scholar
Eric Petajan, Bradford Bischoff, David Bodoff, and N. Michael Brooke. An improved automatic lipreading system to enhance speech recognition. In CHI-88, pages 19–25. ACM, 1988.
Google Scholar
Gordan Peterson and Harold Barney. Control methods used in a study of the vowels. Journal of Acoustical Society of American, 24: 175–184, March 1952.
Article Google Scholar
Lawrence Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Alex Waibel and Kai-Fu Lee, editors, Readings in Speech Recognition, pages 267–296. Morgan Kaufmann Publishers, Inc., 1990.
Google Scholar
Lawrence Rabiner and Bing-Hwang Juang. Fundamentals of Speech Recognition. Prentice-Hall, 1993.
Google Scholar
Peter Silsbee. Computer Lipreading for Improved Accuracy in Automatic Speech Recognition. PhD thesis, The University of Texas at Austin, 1993.
Google Scholar
Steve Smith. Computer lip reading to augment automatic speech recognition. Speech Tech, pages 175–181, 1989.
Google Scholar
David Stork, Greg Wolff, and Earl Levine. Neural network lipreading system for improved speech recognition. International Joint Conference of Neural Networks, 1992.
Google Scholar
Quentin Summerfield. Some preliminaries to a comprehensive account of audiovisual speech perception. In Barbara Dodd and Ruth Campbell, editors, Hearing by Eye: The Psychology of Lipreading, pages 3–51. Lawrence Earlbaum Associated, 1987.
Google Scholar
Henry Tobin. Personal communications, February 1989.
Google Scholar
M. Tomlinson, M. Russell, and N. Brooke. Integrating audio and visual information to provide highly robust speech recognition. In International Conference on Acoustics, Speech, and Signal Processing, pages 821–824. IEEE, May 1996.
Google Scholar
Brian Walden, Robert Prosek, Allen Montgomery, Charlene Scherr, and Carla Jones. Effects of training on the visual recognition of consonant. Journal of Speech and Hearing Research, 20: 130–145, 1977.
Google Scholar
Gill Waters. Speech production and perception. In Chris Rowden, editor, Speech Processing, pages 1–33. McGraw-Hill International, 1992.
Google Scholar
Jian-Tong Wu, Shinichi Tamura, Hiroshi Mitsumoto, Hideo Kawai, Kenji Kurosu, and Kozo Okazaki. Neural network vowel-recognition jointly using voice features and mouth shape image. Pattern Recognition, 24 (10): 921–927, 1991.
Article Google Scholar
Ben Yuhas, Moise Goldstein, and Terrence Sejnowski. Integration of acoustic and visual speech signals using neural networks. IEEE Communications Magazine, pages 65–71, 1989.
Google Scholar
A. Yuille, P. Hallinan, and D. Cohen. Snakes: Active contour models. International Journal on Computer Vision, 8: 99–112, 1992.
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Mitre Corporation, McLean, VA, 22101, USA
Alan J. Goldschen
Wright State University, Dayton, OH, 45435, USA
Oscar N. Garcia
Bell Laboratories, Lucent Technologies, Murray Hill, NJ, 07974, USA
Eric D. Petajan

Authors

Alan J. Goldschen
View author publications
You can also search for this author in PubMed Google Scholar
Oscar N. Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Eric D. Petajan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Vision Laboratory, Computer Science Department, University of Central Florida, 32816, Orlando, Florida, USA
Mubarak Shah
Electrical and Computer Engineering, University of California, San Diego, 92137, San Diego, California, USA
Ramesh Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Goldschen, A.J., Garcia, O.N., Petajan, E.D. (1997). Continuous Automatic Speech Recognition by Lipreading. In: Shah, M., Jain, R. (eds) Motion-Based Recognition. Computational Imaging and Vision, vol 9. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-8935-2_14

Download citation

DOI: https://doi.org/10.1007/978-94-015-8935-2_14
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-4870-7
Online ISBN: 978-94-015-8935-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics