Skip to main content

Speech Coding and Recognition in Noisy Environments for Communication Terminals

  • Chapter
Intelligent Integrated Media Communication Techniques
  • 114 Accesses

Abstract

This chapter addresses the problem of speech processing, which is robust against noise for applications in communication terminals as front-ends to digital networks. By studying the limitations of auditory perception, particularly how it reduces the information rate of the speech signal through masking constraints, improvements may be made in the efficiency of: (1) speaker/speech recognition, (2) wide-band speech coding. In the first case, speech enhancement techniques derived from spectral subtraction are used not only for noise reduction, often unreliable, but also for the detection of missing (masked by noise or unreliable) features. We show that this detection technique can be combined with compensation techniques for missing features in the statistical models (Gaussian Mixture Models (GMMs) and Hidden Markov Models (HMMs)) to improve recognition results. In the second case, the spectral subtraction technique is used to design an integrated speech enhancement/coding system incorporating both ambient noise and quantization noise masking. The advantage of the method presented in this chapter over previous approaches is that perceptual enhancement and coding, usually implemented as a cascade of two separate systems are combined. This leads to a decreased computational load while controlling bit rate and maintaining acceptable speech intelligibility and quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing of Speech Signals. New York: Macmillan Publishing Company, 1993.

    Google Scholar 

  2. G. Davis, ed., Noise Reduction in Speech Applications. CRC Press, Boca Raton, 2002.

    Google Scholar 

  3. J.-C. Junqua and J.-P. Haton, Robustness in Automatic Speech Recognition. Boston: Kluwer Academic Publishers, 1996.

    Google Scholar 

  4. N. Virag, “Single channel speech enhancement based on masking properties of the human auditory system,” IEEE Trans. on Speech and Audio Processing, vol. 7, pp. 126–137, March 1999.

    Google Scholar 

  5. M. Cooke, A. Morris, and P. Green, “Missing data techniques for robust speech recognition,” in ICASSP’97, (Munich, Germany), pp. 863–866, April 1997.

    Google Scholar 

  6. R. P. Lippman and B. A. Carlson, “Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering, and noise,” in EUROSPEECH’97, (Rhodes, Greece), pp. 37–40, Sept. 22–25, 1997.

    Google Scholar 

  7. A. Drygajlo and M. El-Maliki, “Use of generalized spectral subtraction and missing feature compensation for robust speaker verification,” in Workshop on Speaker Recognition and its Commercial and Forensic Applications, (Avignon, France), pp. 80–83, April 20–23, 1998.

    Google Scholar 

  8. A. Drygajlo and M. El-Maliki, “Spectral subtraction and missing feature modeling for speaker verification,” in Signal Processing IX, Theories and Applications (EURASIP), (Rhodes, Greece), pp. 355–358, 1998.

    Google Scholar 

  9. P. Renevey and A. Drygajlo, “Missing feature theory and probabilistic estimation of clean speech components for robust speech recognition,” in EUROSPEECH’99, (Budapest, Hungary), pp. 2627–2630, Sept. 5–9, 1999.

    Google Scholar 

  10. A. Drygajlo and B. Carnero, “Integrated speech enhancement and coding in time-frequency domain,” in ICASSP’97, (Munich, Germany), pp. 1183–1186, April 1997.

    Google Scholar 

  11. B. Carnero and A. Drygajlo, “Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithms,” IEEE Trans. Signal Processing, vol. 47, pp. 1622–1635, June 1999.

    Google Scholar 

  12. M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in IEEE Conf. on Acoust., Speech, Signal Processing, (Washington, DC), pp. 208–211, April 1979.

    Google Scholar 

  13. A. Vizinho, P. Green, M. Cooke, and L. Josifovski, “Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: an integrated study,” in EUROSPEECH’99, (Budapest, Hungary), pp. 2407–2410, Sept. 5–9, 1999.

    Google Scholar 

  14. A. Drygajlo and M. El-Maliki, “Speaker verification in noisy environments with combined spectral subtraction and missing feature theory,” in ICASSP’98, (Seattle, USA), pp. 121–124, May 12–15, 1998.

    Google Scholar 

  15. M. El-Maliki and A. Drygajlo, “Missing features detection and handling for robust speaker verification,” in EUROSPEECH’99, (Budapest, Hungary), pp. 975–978, Sept. 5–9, 1999.

    Google Scholar 

  16. M. El-Maliki and A. Drygajlo, “Missing feature detection and compensation for GMM-based speaker verification in noise,” in COST 250 Workshop on Speaker Recognition in Telephony, (Rome, Italy), November 10–12, 1999.

    Google Scholar 

  17. J. Ortega-García and J. Gonzàlez-Rodríguez, “Overview of speech enhancement techniques for automatic speaker recognition,” in ICSLP’96, (Philadelphia, USA), pp. 929–932, Oct. 1996.

    Google Scholar 

  18. D. A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Communication, vol. 17, pp. 91–108, 1995.

    Article  Google Scholar 

  19. D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture models,” IEEE Trans. on Speech Audio Processing, vol. 3, pp. 72–83, 1995.

    Google Scholar 

  20. Y. Ephraim and D. Malah, “Speech enhancement using a minimum meansquare error short-time spectral amplitude estimator,” IEEE Trans. Acoustics, Speech, and Signal Proc., vol. 32, pp. 1109–1121, Dec. 1984.

    Google Scholar 

  21. M. El-Maliki, Speaker Verification with Missing Features in Noisy Environments. Ph.d. thesis, EPFL, Lausanne, Switzerland, 2000.

    Google Scholar 

  22. M. Cooke, P. Green, L. Josifovski, and A. Vizinho, “Robust automatic speech recognition with missing and unreliable acoustic data,” Speech Communication, vol. 34, no. 3, pp. 267–285, 2001.

    Article  Google Scholar 

  23. M. Cooke, P. Green, and M. Crawford, “Handling missing data in speech recognition,” in ICSLP-94, (Yokohama, Japan), pp. 1555–1558, 1994.

    Google Scholar 

  24. M. El-Maliki, P. Renevey, and A. Drygajlo, “Speaker verification for noisy GSM quality speech,” in International COST 254 Workshop on Intelligent Communication Technologies and Applications, with Emphasis on Mobile Communications, (Neuchâtel, Switzerland), pp. 303–306, May 5–7, 1999.

    Google Scholar 

  25. P. Renevey and A. Drygajlo, “Estimation of unreliable data for robust speech recognition,” in ICASSP’2000, (Istanbul, Turkey), pp. 1731–1734, June 2000.

    Google Scholar 

  26. M. J. F. Gales and S. J. Young, “HMM recognition in noise using parallel model combination,” in EUROSPEECH’93, (Berlin, Germany), pp. 837–840, Sept. 21–23, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Kluwer Academic Publishers

About this chapter

Cite this chapter

Drygajlo, A. (2003). Speech Coding and Recognition in Noisy Environments for Communication Terminals. In: Tasič, J.F., Najim, M., Ansorge, M. (eds) Intelligent Integrated Media Communication Techniques. Springer, Boston, MA. https://doi.org/10.1007/0-306-48718-7_10

Download citation

  • DOI: https://doi.org/10.1007/0-306-48718-7_10

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4020-7552-0

  • Online ISBN: 978-0-306-48718-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics