Perceptual Audio Coding of Speech Signals

Herre, Jürgen; Lutzky, Manfred

doi:10.1007/978-3-540-49127-9_18

Jürgen Herre Dr.⁴ &
Manfred Lutzky M.Sc.⁵

Part of the book series: Springer Handbooks ((SHB))

8035 Accesses
1 Citations

Abstract

Traditionally algorithms for speech coding exploit the features of speech signals by employing algorithmic models of the human vocal tract. More recently, the use of generic audio coders for coding of speech signals has gained increasing importance. Based on the properties of human hearing, such perceptual audio coders offer attractive properties including full-bandwidth audio output, increased naturalness, and good handling of any type of non-speech material. The chapter discusses the principles of perceptual audio coding, some relevant standards, and a number of perceptual audio coders that find application in speech and audio transmission and storage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ACELP:: algebraic code excited linear prediction
AMR-WB+:: extended wide-band adaptive multirate coder
AMR-WB:: wide-band AMR speech coder
CELP:: code-excited linear prediction
DSP:: digital signal processing
ERB:: equivalent rectangular bandwidth
FSS:: frequency selective switch
GSM:: Groupe Spéciale Mobile
HPF:: high-pass filter
IFSS:: inverse frequency selective switch
IMDCT:: inverse MDCT
IQMF:: QMF synthesis filterbank
ITU:: International Telecommunication Union
LPC:: linear predictive coding
LSR:: low sampling rates
LTP:: long term prediction
MDCT:: modified discrete cosine transform
MPEG:: Moving Pictures Expert Ggroup
MSE:: mean-square error
NLMS:: normalized least-mean-square
QMF:: quadrature mirror filter
SNR:: signal-to-noise ratio
TCX:: transform coded excitation
TDAC:: time-domain aliasing cancelation
TDBWE:: time-domain bandwidth extension
TNS:: temporal noise shaping
ULD:: ultra-low delay

References

B.C.J. Moore: Introduction to the Psychology of Hearing, 3rd edn. (Academic, New York 1989)
Google Scholar
E. Zwicker, H. Fastl: Psychoacoustics, Facts and Models (Springer, Berlin, Heidelberg 1990)
Google Scholar
J. Princen, A. Johnson, A. Bradley: Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation, IEEE ICASSP, 2161-2164 (1987)
Google Scholar
J.H. Rothweiler: Polyphase Quadrature Filters - a new Subband Coding Technique, IEEE ICASSP, 1280-1283 (1983)
Google Scholar
K. Brandenburg, E. Eberlein, J. Herre, B. Edler: Comparison of Filterbanks for High Quality Audio Coding, IEEE ISCAS (1992)
Google Scholar
M. Bosi: Filter Banks in Perceptual Audio Coding, Proc. of the 17th International AES Conference on High Quality Audio Coding (1999)
Google Scholar
R.P. Hellman: Asymmetry of Masking between Noise and Tone, Percept. Psychophys. 11, 241-246 (1972)
Article Google Scholar
J. Herre: Temporal Noise Shaping, Quantization and Coding Methods in Perceptual Audio Coding: A Tutorial Introduction, Proc. of the 17th International AES Conference on High Quality Audio Coding (1999)
Google Scholar
ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 11172, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s (ISO, Geneva 1993)
Google Scholar
ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 13818-3, Generic Coding of Moving Pictures and Associated Audio: Audio (1994)
Google Scholar
ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 13818-7, Generic Coding of Moving Pictures and Associated Audio: Advanced Audio Coding (1997)
Google Scholar
ISO/IEC: JTC1/SC29/WG11 MPEG International Standard ISO/IEC 14496-3:2001, Coding of Audio-Visual Objects, Part 3 Audio (2001)
Google Scholar
F. Pereira, T. Ebrahimi (Eds.): The MPEG-4 Book, IMSC Multimedia Series (Prentice Hall, Englewood Cliffs 2002)
Google Scholar
ISO/IEC: JTC1/SC29/WG11 MPEG 14496-3:2001/Amd.1:2003, Coding of Audio-Visual Objects - Part 3: Audio, Amendment 1: Bandwidth extension (2003)
Google Scholar
M. Dietz, L. Liljeryd, K. Kjoerling, O. Kunz: Spectral Band Replication, a Novel Approach in Audio Coding (112th AES Convention, Munich 2002), Preprint 5553
Google Scholar
ISO/IEC: JTC1/SC29/WG11 MPEG 14496-3:2001/Amd.1:2003, Coding of Audio-Visual Objects - Part 3: Audio, Amendment 2: Parametric coding for high quality audio (2004)
Google Scholar
W. Oomen, E. Schuijers, B. den Brinker, J. Breebaart: Advances in Parametric Coding for High-Quality Audio (114th AES Convention, Amsterdam 2002), Preprint 5852
Google Scholar
B. Edler: Codierung von Audiosignalen mit überlappender Transformation und adaptiven Fensterfunktionen, Frequenz 43, 252-256 (1989), in German
Article Google Scholar
E. Allamanche, R. Geiger, J. Herre, T. Sporer: MPEG-4 Low Delay Audio Coding based on the AAC Codec (106th AES Convention, Munich 1999), Preprint 4929
Google Scholar
J. Herre, D. Schulz: Extending the MPEG-4 AAC Codec by Perceptual Noise Substitution (104th AES Convention, Amsterdam 1998), Preprint 4720
Google Scholar
ITU-T Recommendation G.722.1 (5/2005): Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss
Google Scholar
R. Geiger, M. Lutzky, M. Schmidt, M. Schnell: Structural Analysis of Low Latency Audio Coding Scheme (119th AES Convention, New York 2005), Preprint 6601
Google Scholar
G. Schuller, A. Härmä: Low Delay Audio Compression using Predictive Coding, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando (2002)
Google Scholar
G. Schuller, B. Yu, D. Huang, and B. Edler: Perceptual Audio Coding using Adaptive Pre and Post-Filters and Lossless Compression, IEEE Transactions on Speech and Audio Processing (2002) pp. 379-390
Google Scholar
B. Edler, G. Schuller: Audio Coding Using a Psychoacoustic Pre- and Post-Filter, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul (2000)
Google Scholar
J.-H. Chen, R.V. Cox, Y.-C. Lin, N. Jayant, M.J. Melchner: A low-delay CELP coder for the CCITT 16 kb/s speech coding standard, IEEE J. Sel Areas in Comm 10, 830-849 (1992)
Article Google Scholar
A. Härmä, U. K. Laine, and M. Karjalainen: Backward adaptive warped lattice for wideband stereo coding in Proc. of EUSIPCO ʼ98, Greece (1998)
Google Scholar
S.S. Haykin: Adaptive Filter Theory (Prentice Hall, Englewood Cliffs 1999)
MATH Google Scholar
U. Krämer, G. Schuller, S. Wabnik, J. Klier, J. Hirschfeld: Ultra Low Delay audio coding with constant bit rate, 117th AES Convention, San Francisco, Preprint 6197
Google Scholar
S. Wabnik, G. Schuller, J. Hirschfeld, U. Kraemer: Packet Loss Concealment in Predictive Audio Coding, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, (2005) New Paltz
Google Scholar
ITU-T Recommendation G.729.1 (5/2006): G.729 based Embedded Variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729
Google Scholar
GSM 3rd Generation Partnership Project (3GPP), 3GPP TS 26.290: Audio codec processing functions; Extended AMR Wideband codec; Transcoding functions
Google Scholar
B. Bessette, R. Lefebvre, and R. Salami: Universal Speech/Audio Coding Using Hybrid ACELP/TCX Techniques, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia (2005)
Google Scholar
ETSI TR 126 936 V6.1.0 (2006-03): Universal Mobile Telecommunications System (UMTS), Performance characterization of 3GPP audio codecs
Google Scholar
R. Salami, R. Lefebvre, K. Kontola, S. Bruhn, A. Taleb: Extended AMR-WB for high-quality audio on mobile devices, IEEE Commun. Mag. 44(5), 90-97 (2006)
Article Google Scholar
N.H. van Schijndel, S. van de Par: Rate-distortion optimized hybrid sound coding. In:, Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2005) (2005) pp. 235-238
Google Scholar
R. Vafin, W. B. Kleijn: Rate-Distortion Optimized Quantization in Multistage Audio Coding, IEEE Transactions on Speech and Audio Processing 14:311-320 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Audio and Multimedia, Fraunhofer Institute for Integrated Circuits (Fraunhofer IIS), Am Wolfsmantel 33, 91058, Erlangen, Germany
Jürgen Herre Dr.
Multimedia Realtime Systems, Fraunhofer Integrated Circuits (IIS), Am Wolfsmantel 33, 91058, Erlangen, Germany
Manfred Lutzky M.Sc.

Authors

Jürgen Herre Dr.
View author publications
You can also search for this author in PubMed Google Scholar
Manfred Lutzky M.Sc.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jürgen Herre Dr. or Manfred Lutzky M.Sc. .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Herre, J., Lutzky, M. (2008). Perceptual Audio Coding of Speech Signals. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics