Abstract
In general, audio coding or audio compression algorithms are used to obtain compact digital representation of high-quality audio signals for their efficient transmission and storage. The central objective in audio coding is to represent the signal with a minimum number of bits while achieving its transparent reproduction. Besides speech coding schemes based on linear prediction methods especially tailored for efficient speech compression, the developed perceptual transform-based audio coding schemes gained a greater attention, particularly for applications in consumer electronics. Typically, any transform-based audio coding scheme utilizes a near-perfect quadrature mirror filter (QMF) and/or perfect reconstruction cosine-modulated filter bank to obtain a block-wise representation of the audio signal in the frequency domain. Perceptual transform-based audio coding schemes developed up to now are briefly reviewed including the family of ISO/IEC MPEG audio coding standards, proprietary audio compression algorithms, broadcasting/speech/data communication codecs, as well as open-free, patent royalty-free audio/speech codecs. The discussion is concentrated especially on adopted near-perfect QMF and perfect reconstruction cosine-modulated filter banks, processing methods, and specified transform block sizes.
References
M. Bosi, R.E. Goldberg, Introduction to Digital Audio Coding and Standards, Part II: Audio Coding Standards (Springer Science+Business Media, New York, 2003), pp. 265–430
V.K. Madisetti (ed.), The Digital Signal Processing Handbook: Video, Speech, and Audio Signal Processing and Associated Standards, 2nd edn. (CRC, Boca Raton, FL, 2010)
H.S. Malvar, Extended lapped transforms: properties, applications, and fast algorithms. IEEE Trans. Signal Process. 40(11), 2703–2714 (1992)
H.S. Malvar, Signal Processing with Lapped Transforms (Artech House, Norwood, MA, 1992)
H. Malvar, A modulated complex lapped transform and its applications to audio processing, in Proceedings of the IEEE ICASSP’99, Phoenix, AR, May 1999, pp. 1421–1424
T. Painter, A. Spanias, Perceptual coding of digital audio. Proc. IEEE 88(4), 451–513 (2000)
J.P. Princen, A.B. Bradley, Analysis/synthesis filter bank design based on time domain aliasing cancellation. IEEE Trans. Acoust. Speech Signal Process. ASSP-34(5), 1153–1161 (1986)
J.P. Princen, A.W. Johnson, A.B. Bradley, Sub-band/transform coding using filter bank designs based on time domain aliasing cancellation, in Proceedings of IEEE ICASSP’87, Dallas, TX, April 1987, pp. 2161–2164
K.R. Rao, J.J. Hwang, MPEG-1 audiovisual coder for digital storage media (Chapter 10), in Techniques and Standards for Image, Video, Audio Coding (Prentice-Hall, Upper Saddle River, NJ, 1996), pp. 242–265
M. Schnell et al., Low delay filter banks for enhanced low delay audio coding, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 2007, pp. 235–238
A. Spanias, T. Painter, V. Atti, Audio coding standards and algorithms (Chapter 10), in Audio Signal Processing and Coding (Wiley-Interscience, Hoboken, NJ, 2007), pp. 263–342
MPEG-1/2 Audio Coding Standards
Information Technology – Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s. Part 3: Audio, ISO/IEC JTC1/SC29/WG11 MPEG, International Standard 11172-3 (MPEG-1) (1992)
Information Technology – Generic Coding of Moving Pictures and Associated Audio, Part 3: Audio, ISO/IEC JTC1/SC29/WG11 MPEG, International Standard 13818-3 (MPEG-2) (1994)
MPEG–2/4 AAC Audio Coding Standards
M. Bosi et al., ISO/IEC MPEG-2 advanced audio coding, in 101st AES Convention, Los Angeles, CA, November 1996. Preprint #4382. Also published in J. Audio Eng. Soc. 45(10), 789–813 (1997)
Information Technology – Generic Coding of Moving Pictures and Associated Audio Information, Subpart 7: Advanced Audio Coding (AAC), ISO/IEC JTC1/SC29/WG11 MPEG, International Standard 13818-7 (MPEG-2 AAC) (1997)
Information Technology – Coding of Audio-Visual Objects, Part 3: Audio, ISO/IEC JTC1/SC29/WG11 MPEG, International Standard 14496-3 (MPEG-4 Audio) (1999)
MPEG-4 AAC-LD Audio Coding Standard
E. Allamanche, R. Geiger, J. Herre, T. Sporer, MPEG-4 low delay audio coding based on the AAC codec, in 106th AES Convention, Munich, May 1999. Preprint #4929
M. Lutzky, G. Schuller, M. Gayer, U. Krämer, S. Wabnik, A guideline to codec delay, in 116th AES Convention, Berlin, May 2004. Preprint #6062
M. Lutzky, M. Schnell, M. Schmidt, R. Geiger, Structural analysis of low latency audio coding schemes, in 119th AES Convention, New York, NY, October 2005. Preprint #6601
MPEG-4 HE-AAC Audio Coding Standard
A.C. den Brinker et al., An overview of the coding standard MPEG-4 audio Amendments 1 and 2: HE-AAC, SSC and HE-AAC v2. EURASIP J. Audio Speech Music Process. Article ID 468971, 21 (2009)
J. Herre, M. Dietz, MPEG-4 High-Efficiency AAC coding. IEEE Signal Process. Mag. 25(3), 137–142 (2008)
Information Technology – Coding of Audio-Visual Objects – Part 3: Audio, Subpart 4: General Audio Coding (GA)-AAC, TwinVQ, BSAC. ISO/IEC 14496–3:2005(E) (2005)
M. Wolters, K. Kjörling, D. Homm, H. Purnhagen, A closer look into MPEG-4 High Efficiency AAC, in 115th AES Convention, New York, NY, October 2003. Preprint #5871
MPEG-4 AAC-ELD Audio Coding Standard
Information Technology – Coding of Audio-Visual Objects – Part 3: Audio, Amendment 9: Enhanced Low Delay AAC. ISO/IEC 14496–3:2005/FDAM 9:2007(E), N9499, Shenzhen, October 2007
M. Lutzky, M.L. Valero, M. Schnell, J. Hilpert, AAC-ELD v2 – The new state of the art in high quality communication audio coding, in 131st AES Convention, New York, NY, October 2011. Preprint #8516
M. Schnell et al., Enhanced MPEG-4 low delay AAC – Low bitrate high quality communication, in 122nd AES Convention, Vienna, May 2007. Preprint #6998
M. Schnell et al., MPEG-4 enhanced low delay AAC – A new standard for high quality communication, in 125th AES Convention, San Francisco, CA, October 2008. Preprint #7503
MPEG-4 SLS and HD-AAC/SLS Scalable Lossless Audio Coding Standards
R. Geiger, G. Schuller, J. Herre, R. Sperschneider, T. Sporer, Scalable perceptual and lossless audio coding based on MPEG-4 AAC, in 115th AES Convention, New York, NY, October 2003. Preprint #5868
R. Geiger, R. Yu, J. Herre, S. Rahardja, S.-W. Kim, X. Lin, M. Schmidt, ISO/IEC MPEG-4 high-definition scalable advanced audio coding. J. Audio Eng. Soc. 55(1)/2, 27–43 (2007)
ISO/IEC 14496-3:2005/Amd.3:2006, Coding of Audio-Visual Objects – Part 3: Audio, Amendment 3: Scalable Lossless Coding (SLS). International Standards Organization, Geneva (2006)
R. Yu, R. Geiger, S. Rahardja, J. Herre, X. Lin, H. Huang, MPEG-4 scalable to lossless audio coding, in 117th AES Convention, San Francisco, CA, October 2004. Preprint #6183
R. Yu, S. Rahardja, X. Lin, C.C. Ko, A fine granular scalable to lossless audio coding. IEEE Trans. Audio Speech Lang. Process. 14(4), 1352–1363 (2006)
MPEG-D USAC: Unified Speech and Audio Coding
B. Edler, S. Disch, S. Bayer, G. Guillaume, R. Geiger, A time-warped MDCT approach to speech transform coding, in 126th AES Convention, Munich, May 2009. Preprint #7710
C.R. Helmrich et al., Efficient transform coding of two-channel audio signals by means of complex-valued stereo prediction, in Proceedings of the IEEE ICASSP’2011, Prague, May 2011, pp. 497–500
A. Heuerberger, G. Elst, R. Hanke (eds.), MPEG unified speech and audio coding – Bridging the gap, in Microelectronic Systems: Circuits, Systems and Applications (Springer, Berlin, 2011), pp. 343–353
ISO/IEC 23003—3:2012, MPEG audio technologies, Part 3: Unified Speech and Audio Coding, Geneva, January 2012
K. Kikuri, N. Naka, MPEG Unified speech and audio coding enabling efficient coding of both speech and music. NTT DOCOMO Tech. J. 13(3), 17–22 (2011)
M. Neuendorf et al., A novel scheme for low bit rate Unified Speech and Audio Coding – MPEG RM0, in 126th AES Convention, Munich, May 2009. Preprint #7713
M. Neuendorf et al., Unified speech and audio coding scheme for high quality at low bitrates, in Proceedings of the IEEE ICASSP’2009, Taipei, April 2009, pp. 1–4
M. Neuendorf et al., The ISO/MPEG Unified Speech and Audio Coding standard – Consistent high quality for all content types and at all bit rates, in 132nd AES Convention, Budapest, April 2012. Preprint #8654. Also published in J. Audio Eng. Soc. 61(12), 956–977 (2013)
S. Quackenbush, MPEG unified speech and audio coding. IEEE MultiMedia 20(2), 72–78 (2013)
Proprietary Audio Compression Algorithms
M. Bosi, G.A. Davidson, High-quality, low-rate audio transform coding for transmission and multimedia applications, in 93rd AES Convention, San Francisco, CA, December 1992. Preprint# 3365
G.A. Davidson, L.D. Fielder, M. Antill, Low-complexity transform coder for satellite link applications, in 89th AES Convention, New York, NY, September 1990. Preprint# 2966
G.A. Davidson, M.A. Isnardi, L.D. Fielder, M.S. Goldman, C.C. Todd, ATSC video and audio coding. Proc. IEEE 94(1), 60–76 (2006)
Digital Audio Compression (AC-3) ATSC Standard, Document A/52/10 of Advanced Television Systems Committee (ATSC), Audio Specialist Group T3/S7, Washington, DC, December 1995
Digital Audio Compression Standard (AC-3, E-AC-3), Revision B, Document A/52B of Advanced Television Systems Committee (ATSC), Washington DC, December 2012
L.D. Fielder, G.A. Davidson, AC-2: a family of low complexity transform-based music coders, in Proceedings of the 10th International AES Conference: Images of Audio, London, September 1991, pp. 55–70
L.D. Fielder, D.P. Robinson, AC-2 and AC-3: the technology and its applications, in 5th Australian Regional Convention, Sydney, April 1995. Preprint #4022
L.D. Fielder et al., Introduction to Dolby Digital Plus, an enhancement to the Dolby digital coding system, in 117th AES Convention, San Francisco, CA, October 2004. Preprint #6196
J.D. Johnson, A.J. Ferreira, Sum-difference stereo transform coding, in Proceedings of the IEEE ICASSP’92, vol. II, San Francisco, CA, March 1992, pp. 569–572
J. Johnson et al., AT&T perceptual audio coder (PAC), in Collected Papers on Digital Audio Bit-Rate Reduction, ed. by N. Gilchrist, C. Grewin (Audio Engineering Society, New York, 1996), pp. 73–81
D. Sinha, J.D. Johnson, Audio compression at low bit rates using a signal adaptive switched filterbank, in Proceedings of the IEEE ICASSP’96, Atlanta, GA, May 1996, pp. 1053–1056
K. Tsustsui at al., ATRAC: adaptive transform acoustics coding for MiniDisc, in 93rd AES Convention, San Francisco, CA, October 1992. Preprint #3456
T. Yoshida, The rewritable MiniDisc system. Proc. IEEE 82(10), 1492–1500 (1994)
Broadcasting/Speech/Data Communication Codecs
3GGP2 C.S0014–C v1.0, Enhanced variable rate codec, speech service Option 3, 68 and 70 for wide-band spread spectrum digital systems (2007)
M. Bellanger, D. Matera, M. Tanda, A filter bank multicarrier scheme running at symbol rate for future wireless systems, in Proceedings of the IEEE Wireless Telecommunications Symposium (WTS’2015), New York, NY, April 2015, pp. 1–5
M. Bellanger, D. Matera, M. Tanda, Lapped-OFDM as an alternative to CP-OFDM for 5G asynchronous access and cognitive radio, in Proceedings of the IEEE 81st Vehicular Technology Conference (VTC Spring), Glasgow, May 2015, pp. 1–5
Digital Radio Mondiale (DRM): System Specification, ETSI ES 201 980 v3.1.1 (2009–08), ETSI Standard, August 2009 (available on web site http://www.drm.org)
W. Hoeg, T. Lauterbach (eds.), Audio services and applications (Chapter 3), in Digital Audio Broadcasting: Principles and Applications of DAB, DAB+ and DMB, 3rd edn. (Wiley, Chichester, 2009), pp. 93–165
ITU-T Recommendation G.722.1 Annex C, Low-complexity coding at 24 and 32 kbit/s for hands-free operation in systems with low frame loss. Annex C: 14 kHz Mode at 24, 32, and 48 kbit/s, May 2005
ITU-T SG16 Q9 – Contribution 199: extended high-level description of the Q9 EV-VBR baseline codec (2007)
L. Laaksonen et al., Super wide-band extension of G.718 and G.729.1 speech codec, in Proceedings of 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Makuhari, September 2010
J. Mäkinen et al., AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, in Proceedings of the IEEE ICASSP’2005, vol. II, Philadelphia, PA, March 2005, pp. 1109–1112
S. Ragot et al., ITU-T G.729.1: an 8–32 kbit/s scalable coder interoperable with G.729 for wideband telephony and voice IP, in Proceedings of the IEEE ICASSP’2007, Honolulu, HI, April 2007, pp. 529–532
R. Salami et al., Extended AMR-WB for high-quality audio on mobile devices. IEEE Commun. Mag. 44(5), 90–97 (2006)
Sirius Satellite Radio, Available on web site: http://www.siriusradio.com
T. Vaillancourt et al., ITU-T EV-VBR: a robust 8–32kbit/s scalable coder for error prone telecommunication channels, in Proceedings of the 16th European Signal Processing Conference, Lausanne, August 2008
M. Xie, D. Lindbergh, P. Chu, From ITU-T G.722.1 to ITU-T G.722.1 Annex C: a new low-complexity 14kHz bandwidth audio coding standard, in Proceedings of the IEEE ICASSP’2006, vol. 5, Toulouse, May 2006, pp. 173–176. Also published in J. Multimedia 2(2), 65–76 (2007)
M. Xie, P. Chu, A. Taleb, M. Briand, ITU-T G.719: a new low-complexity full-band (20kHz) audio coding standard for high quality conversational applications, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA’2009), New Paltz, NY, October 2009, pp. 265–268
XM Satellite Radio, Available on web site: http://www.xmradio.com
Open-Source and royalty-Free Audio/Speech Codecs
OPUS interactive audio/speech codec, 2016. Available on web sites: www.vorbis.com or www.opus-codec.org
The CELT ultra-low delay audio codec, February 2011. Available on web sites: www.vorbis.com or www.celt-codec.org
J.-M. Valin, T.B. Terriberry, G. Maxwell, A full-bandwidth audio codec with low complexity and very low delay, in Proceedings of the 17th European Signal Processing Conference (EUSIPCO’2009), Glasgow, August 2009, pp. 1254–1258
J.M. Valin, K. Vos, T.B. Terriberry, Definition of the OPUS audio codec, Internet Engineering Task Force (IETF). RFC 6716 Standard Specification, September 2012. Available on web site: www.vorbis.com
J.-M. Valin, T.B. Terriberry, C. Montgomery, G. Maxwell, A high-quality speech and audio codec with less than 10 ms delay. IEEE Trans. Audio Speech Lang. Process. 18(1), 58–67 (2010)
J.-M. Valin, G. Maxwell, T.B. Terriberry, C. Montgomery, K. Vos, High-quality, low-delay music coding in the Opus codec, in 135th AES Convention, New York, NY, October 2013. Preprint #8942
Vorbis I specification, Xiph.Org Foundation (2015). Available on web site: www.vorbis.com
K. Wright, Notes on Ogg Vorbis and the MDCT, Draft document available on web site: www.free-comp-shop.com/vorbis.html (2003), 7 pp.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Britanak, V., Rao, K.R. (2018). Audio Coding Standards, (Proprietary) Audio Compression Algorithms, and Broadcasting/Speech/Data Communication Codecs: Overview of Adopted Filter Banks. In: Cosine-/Sine-Modulated Filter Banks. Springer, Cham. https://doi.org/10.1007/978-3-319-61080-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-61080-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61078-8
Online ISBN: 978-3-319-61080-1
eBook Packages: EngineeringEngineering (R0)