Skip to main content

Perceptual Audio Coding of Speech Signals

  • Chapter
Springer Handbook of Speech Processing

Part of the book series: Springer Handbooks ((SHB))

Abstract

Traditionally algorithms for speech coding exploit the features of speech signals by employing algorithmic models of the human vocal tract. More recently, the use of generic audio coders for coding of speech signals has gained increasing importance. Based on the properties of human hearing, such perceptual audio coders offer attractive properties including full-bandwidth audio output, increased naturalness, and good handling of any type of non-speech material. The chapter discusses the principles of perceptual audio coding, some relevant standards, and a number of perceptual audio coders that find application in speech and audio transmission and storage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ACELP:

algebraic code excited linear prediction

AMR-WB+:

extended wide-band adaptive multirate coder

AMR-WB:

wide-band AMR speech coder

CELP:

code-excited linear prediction

DSP:

digital signal processing

ERB:

equivalent rectangular bandwidth

FSS:

frequency selective switch

GSM:

Groupe Spéciale Mobile

HPF:

high-pass filter

IFSS:

inverse frequency selective switch

IMDCT:

inverse MDCT

IQMF:

QMF synthesis filterbank

ITU:

International Telecommunication Union

LPC:

linear predictive coding

LSR:

low sampling rates

LTP:

long term prediction

MDCT:

modified discrete cosine transform

MPEG:

Moving Pictures Expert Ggroup

MSE:

mean-square error

NLMS:

normalized least-mean-square

QMF:

quadrature mirror filter

SNR:

signal-to-noise ratio

TCX:

transform coded excitation

TDAC:

time-domain aliasing cancelation

TDBWE:

time-domain bandwidth extension

TNS:

temporal noise shaping

ULD:

ultra-low delay

References

  1. B.C.J. Moore: Introduction to the Psychology of Hearing, 3rd edn. (Academic, New York 1989)

    Google Scholar 

  2. E. Zwicker, H. Fastl: Psychoacoustics, Facts and Models (Springer, Berlin, Heidelberg 1990)

    Google Scholar 

  3. J. Princen, A. Johnson, A. Bradley: Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation, IEEE ICASSP, 2161-2164 (1987)

    Google Scholar 

  4. J.H. Rothweiler: Polyphase Quadrature Filters - a new Subband Coding Technique, IEEE ICASSP, 1280-1283 (1983)

    Google Scholar 

  5. K. Brandenburg, E. Eberlein, J. Herre, B. Edler: Comparison of Filterbanks for High Quality Audio Coding, IEEE ISCAS (1992)

    Google Scholar 

  6. M. Bosi: Filter Banks in Perceptual Audio Coding, Proc. of the 17th International AES Conference on High Quality Audio Coding (1999)

    Google Scholar 

  7. R.P. Hellman: Asymmetry of Masking between Noise and Tone, Percept. Psychophys. 11, 241-246 (1972)

    Article  Google Scholar 

  8. J. Herre: Temporal Noise Shaping, Quantization and Coding Methods in Perceptual Audio Coding: A Tutorial Introduction, Proc. of the 17th International AES Conference on High Quality Audio Coding (1999)

    Google Scholar 

  9. ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 11172, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s (ISO, Geneva 1993)

    Google Scholar 

  10. ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 13818-3, Generic Coding of Moving Pictures and Associated Audio: Audio (1994)

    Google Scholar 

  11. ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 13818-7, Generic Coding of Moving Pictures and Associated Audio: Advanced Audio Coding (1997)

    Google Scholar 

  12. ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 14496-3:2001, Coding of Audio-Visual Objects, Part 3 Audio (2001)

    Google Scholar 

  13. F. Pereira, T. Ebrahimi (Eds.): The MPEG-4 Book, IMSC Multimedia Series (Prentice Hall, Englewood Cliffs 2002)

    Google Scholar 

  14. ISO/IEC: JTC1/SC29/WG11 MPEG 14496-3:2001/Amd.1:2003, Coding of Audio-Visual Objects - Part 3: Audio, Amendment 1: Bandwidth extension (2003)

    Google Scholar 

  15. M. Dietz, L. Liljeryd, K. Kjoerling, O. Kunz: Spectral Band Replication, a Novel Approach in Audio Coding (112th AES Convention, Munich 2002), Preprint 5553

    Google Scholar 

  16. ISO/IEC: JTC1/SC29/WG11 MPEG 14496-3:2001/Amd.1:2003, Coding of Audio-Visual Objects - Part 3: Audio, Amendment 2: Parametric coding for high quality audio (2004)

    Google Scholar 

  17. W. Oomen, E. Schuijers, B. den Brinker, J. Breebaart: Advances in Parametric Coding for High-Quality Audio (114th AES Convention, Amsterdam 2002), Preprint 5852

    Google Scholar 

  18. B. Edler: Codierung von Audiosignalen mit überlappender Transformation und adaptiven Fensterfunktionen, Frequenz 43, 252-256 (1989), in German

    Article  Google Scholar 

  19. E. Allamanche, R. Geiger, J. Herre, T. Sporer: MPEG-4 Low Delay Audio Coding based on the AAC Codec (106th AES Convention, Munich 1999), Preprint 4929

    Google Scholar 

  20. J. Herre, D. Schulz: Extending the MPEG-4 AAC Codec by Perceptual Noise Substitution (104th AES Convention, Amsterdam 1998), Preprint 4720

    Google Scholar 

  21. ITU-T Recommendation G.722.1 (5/2005): Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss

    Google Scholar 

  22. R. Geiger, M. Lutzky, M. Schmidt, M. Schnell: Structural Analysis of Low Latency Audio Coding Scheme (119th AES Convention, New York 2005), Preprint 6601

    Google Scholar 

  23. G. Schuller, A. Härmä: Low Delay Audio Compression using Predictive Coding, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando (2002)

    Google Scholar 

  24. G. Schuller, B. Yu, D. Huang, and B. Edler: Perceptual Audio Coding using Adaptive Pre and Post-Filters and Lossless Compression, IEEE Transactions on Speech and Audio Processing (2002) pp. 379-390

    Google Scholar 

  25. B. Edler, G. Schuller: Audio Coding Using a Psychoacoustic Pre- and Post-Filter, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul (2000)

    Google Scholar 

  26. J.-H. Chen, R.V. Cox, Y.-C. Lin, N. Jayant, M.J. Melchner: A low-delay CELP coder for the CCITT 16 kb/s speech coding standard, IEEE J. Sel Areas in Comm 10, 830-849 (1992)

    Article  Google Scholar 

  27. A. Härmä, U. K. Laine, and M. Karjalainen: Backward adaptive warped lattice for wideband stereo coding in Proc. of EUSIPCO ʼ98, Greece (1998)

    Google Scholar 

  28. S.S. Haykin: Adaptive Filter Theory (Prentice Hall, Englewood Cliffs 1999)

    MATH  Google Scholar 

  29. U. Krämer, G. Schuller, S. Wabnik, J. Klier, J. Hirschfeld: Ultra Low Delay audio coding with constant bit rate, 117th AES Convention, San Francisco, Preprint 6197

    Google Scholar 

  30. S. Wabnik, G. Schuller, J. Hirschfeld, U. Kraemer: Packet Loss Concealment in Predictive Audio Coding, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (2005) New Paltz

    Google Scholar 

  31. ITU-T Recommendation G.729.1 (5/2006): G.729 based Embedded Variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729

    Google Scholar 

  32. GSM 3rd Generation Partnership Project (3GPP), 3GPP TS 26.290: Audio codec processing functions; Extended AMR Wideband codec; Transcoding functions

    Google Scholar 

  33. B. Bessette, R. Lefebvre, and R. Salami: Universal Speech/Audio Coding Using Hybrid ACELP/TCX Techniques, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia (2005)

    Google Scholar 

  34. ETSI TR 126 936 V6.1.0 (2006-03): Universal Mobile Telecommunications System (UMTS), Performance characterization of 3GPP audio codecs

    Google Scholar 

  35. R. Salami, R. Lefebvre, K. Kontola, S. Bruhn, A. Taleb: Extended AMR-WB for high-quality audio on mobile devices, IEEE Commun. Mag. 44(5), 90-97 (2006)

    Article  Google Scholar 

  36. N.H. van Schijndel, S. van de Par: Rate-distortion optimized hybrid sound coding. In:, Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2005) (2005) pp. 235-238

    Google Scholar 

  37. R. Vafin, W. B. Kleijn: Rate-Distortion Optimized Quantization in Multistage Audio Coding, IEEE Transactions on Speech and Audio Processing 14:311-320 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jürgen Herre Dr. or Manfred Lutzky M.Sc. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Herre, J., Lutzky, M. (2008). Perceptual Audio Coding of Speech Signals. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics