History and Development of Speech Recognition

Furui, Sadaoki

doi:10.1007/978-0-387-73819-2_1

Sadaoki Furui³

1690 Accesses
12 Citations

Abstract

Speech is the primary means of communication between humans. For reasons ranging from technological curiosity about the mechanisms for mechanical realization of human speech capabilities to the desire to automate simple tasks which necessitate human–machine interactions, research in automatic speech recognition by machines has attracted a great deal of attention for five decades.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Allen, J. (2002). From Lord Rayleigh to Shannon: How do we decode speech? In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Orlando, FL, http://www.auditorymodels.org/jba/PAPERS/ICASSP/Plenary_Allen.asp.html.
ATIS Technical Reports (1995). Proc. ARPA Spoken Language Systems Technology Workshop, Austin, TX, 241–280.
Google Scholar
Beek, B., Neuberg, E., Hodge, D. (1977). An assessment of the technology of automatic speech recognition for military applications. IEEE Trans. Acoust., Speech, Signal Process., 25, 310–322.
Google Scholar
Bridle, J. S., Brown, M. D. (1979). Connected word recognition using whole word templates. In: Proc. Inst. Acoustics Autumn Conf., 25–28.
Google Scholar
Chou, W. (2003). Minimum classification error (MCE) approach in pattern recognition. Chou, W., Juang, B.-H. (eds) Pattern Recognition in Speech and Language Processing. CRC Press, New York, 1–49.
Google Scholar
Chow, Y. L., Dunham, M. O., Kimball, O. A. (1987). BYBLOS, the BBN continuous speech recognition system. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Dallas, TX, 89–92.
Google Scholar
Davis, K. H., Biddulph, R., Balashek, S. (1952). Automatic recognition of spoken digits. J. Acoust. Soc. Am., 24 (6), 637–642.
Google Scholar
Ferguson, J. (ed) (1980). Hidden Markov Models for Speech. IDA, Princeton, NJ.
Google Scholar
Forgie, J. W., Forgie, C. D. (1959). Results obtained from a vowel recognition computer program. J. Acoust. Soc. Am., 31 (11), 1480–1489.
Google Scholar
Fry, D. B., Denes, P. (1959). Theoretical aspects of mechanical speech recognition. The design and operation of the mechanical speech recognizer at University College London. J. British Inst. Radio Eng., 19 (4), 211–229.
Google Scholar
Furui, S. (1986). Speaker independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust., Speech, Signal Process., 34, 52–59.
Google Scholar
Furui, S. (2004). Speech-to-text and speech-to-speech summarization of spontaneous speech. IEEE Trans. Speech Audio Process., 12, 401–408.
Google Scholar
Furui, S. (2004). Fifty years of progress in speech and speaker recognition. In: Proc. 148th Acoustical Society of America Meeting, San Diego, CA, 2497.
Google Scholar
Furui, S. (2005). Recent progress in corpus-based spontaneous speech recognition. IEICE Trans. Inf. Syst., E88-D (3), 366–375.
Google Scholar
Gales, M. J. F., Young, S. J. (1993). Parallel model combination for speech recognition in noise. Technical Report, CUED/F-INFENG/TR135.
Google Scholar
Itakura, F. (1975). Minimum prediction residual applied to speech recognition. IEEE Trans. Acoust., Speech, Signal Process., 23, 67–72.
Google Scholar
Jelinek, F. (1985). The development of an experimental discrete dictation recognizer. Proc. IEEE, 73 (11), 1616–1624.
Google Scholar
Jelinek, F., Bahl, L., Mercer, R. (1975). Design of a linguistic statistical decoder for the recognition of continuous speech. IEEE Trans. Inf. Theory, 21, 250–256.
MATH Google Scholar
Juang, B. H., Furui, S. (2000). Automatic speech recognition and understanding: A first step toward natural human-machine communication. Proc. IEEE, 88 (8), 1142–1165.
Google Scholar
Juang, B. H., Rabiner, L. R. (2005). Automatic speech recognition: History. Brown, K. (ed) Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, New York, 11, 806–819.
Google Scholar
Junqua, J. C., Haton, J. P. (1996). Robustness in Automatic Speech Recognition. Kluwer, Boston.
Google Scholar
Katagiri, S. (2003). Speech pattern recognition using neural networks. Chou, W., Juang, B. H. (eds) Pattern Recognition in Speech and Language Processing. CRC Press, New York, 115–147.
Google Scholar
Kawahara, T., Lee, C. H., Juang, B. H. (1998). Key-phrase detection and verification for flexible speech understanding. IEEE Trans. Speech Audio Process, 6, 558–568.
Google Scholar
Klatt, D. (1977). Review of the ARPA speech understanding project. J. Acoust. Soc. Am., 62 (6), 1324–1366.
Google Scholar
Koo, M. W., Lee, C. H., Juang, B. H. (2001). Speech recognition and utterance verification based on a generalized confidence score. IEEE Trans. Speech Audio Process, 9, 821–832.
Google Scholar
Lee, C. H., Giachin, E., Rabiner, L. R., Pieraccini, R., Rosenberg, A. E. (1990). Acoustic modeling for large vocabulary speech recognition. Comput. Speech Lang., 4, 127–165.
Google Scholar
Lee, C. H., Rabiner, L. R. (1989). A frame synchronous network search algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process, 37, 1649–1658.
Google Scholar
Lee, K. F., Hon, H., Reddy, R. (1990). An overview of the SPHINX speech recognition system. IEEE Trans. Acoust., Speech, Signal Process, 38, 600–610.
Google Scholar
Leggetter, C. J., Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang., 9, 171–185.
Google Scholar
Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Mag., 4 (2), 4–22.
Google Scholar
Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communication, 22, 1–15.
Google Scholar
Liu, Y., Shriberg, E., Stolcke, A., Peskin, B., Ang, J., Hillard, D., Ostendorf, M., Tomalin, M., Woodland, P. C., Harper, M. (2005). Structural metadata research in the EARS program. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Montreal, Canada, V, 957–960.
Google Scholar
Lowerre, B. (1980). The HARPY speech understanding system. Lea, W (ed) Trends in Speech Recognition. Prentice Hall, NJ, 576–586.
Google Scholar
Martin, T. B., Nelson, A. L., Zadell, H. J. (1964). Speech recognition by feature abstraction techniques. Technical Report AL-TDR-64-176, Air Force Avionics Lab.
Google Scholar
Moore, R. C. (1997). Using natural-language knowledge sources in speech recognition. Ponting, K. (ed) Computational Models of Speech Pattern Processing. Springer, Berlin, 304–327.
Google Scholar
Myers, C. S., Rabiner, L. R. (1981). A level building dynamic time warping algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process., 29, 284–297.
MATH Google Scholar
Nagata, K., Kato, Y., Chiba, S. (1963). Spoken digit recognizer for Japanese language. NEC Res. Develop., 6.
Google Scholar
Olson, H. F., Belar, H. (1956). Phonetic typewriter. J. Acoust. Soc. Am., 28 (6), 1072–1081.
Google Scholar
Paul, D. B. (1989). The Lincoln robust continuous speech recognizer. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Glasgow, Scotland, 449–452.
Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77 (2), 257–286.
Google Scholar
Rabiner, L. R., Juang, B. H. (1993). Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliff, NJ.
Google Scholar
Rabiner, L. R., Levinson, S. E., Rosenberg, A. E. (1979). Speaker independent recognition of isolated words using clustering techniques. IEEE Trans. Acoust., Speech, Signal Process., 27, 336–349.
MATH Google Scholar
Reddy, D. R. (1966). An approach to computer speech recognition by direct analysis of the speech wave. Technical Report No. C549, Computer Science Department, Stanford University, Stanford.
Google Scholar
Sakai, T., Doshita, S. (1962). The phonetic typewriter, information processing. In: Proc. IFIP Congress, Munich.
Google Scholar
Sakoe, H. (1979). Two level DP matching – a dynamic programming based pattern matching algorithm for connected word recognition. IEEE Trans. Acoust., Speech, Signal Process., 27, 588–595.
Google Scholar
Sakoe, H., Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust., Speech, Signal Process., 26, 43–49.
MATH Google Scholar
Shinoda, K., Lee, C. H. (2001). A structural Bayes approach to speaker adaptation. IEEE Trans. Speech Audio Process., 9, 276–287.
Google Scholar
Soltau, H., Kingsbury, B., Mangu, L., Povey, D., Saon, G., Zweig, G. (2005). The IBM 2004 conversational telephone system for rich transcription. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Montreal, Canada, I, 205–208.
Google Scholar
Suzuki, J., Nakata, K. (1961). Recognition of Japanese vowels – preliminary to the recognition of speech. J. Radio Res. Lab., 37 (8), 193–212.
Google Scholar
Tappert, C., Dixon, N. R., Rabinowitz, A. S., Chapman, W. D. (1971). Automatic recognition of continuous speech utilizing dynamic segmentation, dual classification, sequential decoding and error recovery. Rome Air Dev. Cen, Rome, NY, Technical Report TR 71–146.
Google Scholar
Varga, P., Moore, R. K. (1990). Hidden Markov model decomposition of speech and noise. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Albuquerque, New Mexico, 845–848.
Google Scholar
Velichko, V. M., Zagoruyko, N. G. (1970). Automatic recognition of 200 words. Int. J. Man-Machine Studies, 2, 223–234.
Google Scholar
Vintsyuk, T. K. (1968). Speech discrimination by dynamic programming. Kibernetika, 4 (2), 81–88.
MathSciNet Google Scholar
Viterbi, J. (1967). Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Inf. Theory, 13, 260–269.
MATH Google Scholar
Waibel, A., Hanazawa, T., Hinton, G., Shiano, K., Lang, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust., Speech, Signal Process., 37, 393–404.
Google Scholar
Weintraub, M., Murveit, H., Cohen, M., Price, P., Bernstein, J., Bell, G. (1989). Linguistic constraints in hidden Markov model based speech recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, Signal Processing, Glasgow, Scotland, 699–702.
Google Scholar
Zue, V., Glass, J., Phillips, M., Seneff, S. (1989). The MIT summit speech recognition system, a progress report. In: Proc. DARPA Speech and Natural Language Workshop, Philadelphia, PA, 179–189.
Google Scholar
Zweig, G. (1998). Speech recognition with dynamic Bayesian networks. Ph.D. Thesis, University of California, Berkeley.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Tokyo Institute of Technology, 2-12-1 Okayama, Meguro-ku, Tokyo, 152-8552, Japan
Sadaoki Furui

Authors

Sadaoki Furui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sadaoki Furui .

Editor information

Editors and Affiliations

Department of Computing Science & Engineering, Chalmers University of Technology, 412 96, Göteborg, Sweden
Fang Chen
Department of Speech Sciences, University of Helsinki, 9, FIN-00014, Helsinki, Finland
Kristiina Jokinen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Furui, S. (2010). History and Development of Speech Recognition. In: Chen, F., Jokinen, K. (eds) Speech Technology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-73819-2_1

Download citation

DOI: https://doi.org/10.1007/978-0-387-73819-2_1
Published: 17 April 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-73818-5
Online ISBN: 978-0-387-73819-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics