Skip to main content

Abstract

This contribution describes the challenges and the progress which have been made in Verbmobil concerning robustness of speech recognition for various types of adverse conditions, like channel distortion, environmental noise and various speaker and speaking conditions. For the channel and noise problem classical approaches like cepstral bias normalization and spectral subtraction methods have been improved as well as new methods like parallel model combination. One major result is the fact, that an intelligent combination of various methods achieves the best results. Considerable progresses have also been made in research on unsupervised speaker adaptation. Several different main approaches are presented to improve robustness against variations of speaking rate, speaking style and speaker characteristics. The methods described include new estimation of the parameters for vocal tract length normalization, features and codebook transformation methods using ML algorithms, and pronunciation adaptation of the words in the lexicon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Berouti, M., Schwartz, R., and Makhoul, J. (1979). Enhancement of Speech Corrupted by Acoustic Noise. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 208–211.

    Google Scholar 

  • Class, F., Kaltenmeier, A., and Regel-Brietzmann, P. (1993). Optimization of an HMM-Based Continuous Speech Recognizer. In Proceedings of the 3rd European Conference on Speech Communication and Technology, 803–806.

    Google Scholar 

  • Gales, M. and Young, S. (1996). Robust Continuous Speech Recognition Using Parallel Model Combination. IEEE Transactions on Speech and Audio Processing 4(5):352 – 359.

    Article  Google Scholar 

  • Gong, Y. (1995). Speech Recognition in Noisy Environments: A Survey. Speech Communication 16:261–291.

    Article  Google Scholar 

  • Haiber, U. (1998). Sprecheradaption in einem Spracherkennungssystem mit stochastischer Modellierung. Aachen, Germany 1998: Shaker Verlag.

    Google Scholar 

  • Lee, L. and Rose, R. (1996). Speaker Normalization Using Efficient Frequency Warping Procedures. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 353–356.

    Google Scholar 

  • Legetter, C. and Woodland, P. (1994). Speaker Adaptation of Continuous Density HMMs Using Multivariate Linear Regression. In Proceedings of the 3rd International Conference on Spoken Language Processing, 451–454.

    Google Scholar 

  • Lockwood, P. and Boudy, J. (1992). Experiments with a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the Projection, for Robust Speech Recognition in Cars. Speech Communication 11:215–228.

    Article  Google Scholar 

  • Pfau, T. and Ruske, G. (1998a). Estimating the Speaking Rate by Vowel Detection. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 945–948.

    Google Scholar 

  • Pfau, T. and Ruske, G. (1998b). Creating Hidden Markov Models for Fast Speech. In Proceedings of the 5th International Conference on Spoken Language Processing, 205 – 208.

    Google Scholar 

  • Pfau, T., Faltlhauser, R., and Ruske, G. (1999). Speaker Normalization and Pronunciation Variant Modeling: Helpful Methods for Improving Recognition of Fast Speech. In Proceedings of the 6th European Conference on Speech Communication and Technology, 299–302.

    Google Scholar 

  • Reinecke, J. (1996). Evaluierung der signalnahen Spracherkennung im Verbundprojekt VERBMOBIL (Herbst 1996). Verbmobil Memo 113.

    Google Scholar 

  • Schless, V. and Class, F. (1997). Adaptive Model Combination for Robust Speech Recognition in Car Environments. In Proceedings of the 5th European Conference on Speech Communication and Technology, 1091–1094.

    Google Scholar 

  • Weilhammer, K., Burger, S., Scheer, C., and Wesenick, B. (1999). File Names, Formats and Structures. In VERBMOBIL II. Verbmobil Memo 131, Institut für Phonetik und Sprachliche Kommunikation der Universität München.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Haiber, U., Mangold, H., Pfau, T., Regel-Brietzmann, P., Ruske, G., Schleß, V. (2000). Robust Recognition of Spontaneous Speech. In: Wahlster, W. (eds) Verbmobil: Foundations of Speech-to-Speech Translation. Artificial Intelligence. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04230-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-04230-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-08730-1

  • Online ISBN: 978-3-662-04230-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics