Bio-Inspired Sparse Representation of Speech and Audio Using Psychoacoustic Adaptive Matching Pursuit

Petrovsky, Alexey; Herasimovich, Vadzim; Petrovsky, Alexander

doi:10.1007/978-3-319-43958-7_18

Alexey Petrovsky¹⁶,
Vadzim Herasimovich¹⁶ &
Alexander Petrovsky¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9811))

Included in the following conference series:

International Conference on Speech and Computer

2236 Accesses
2 Citations

Abstract

Current paper devoted to the sparse audio and speech signal modelling via the matching pursuit (MP) algorithm. Redundant dictionary of the time-frequency functions is constructed through the frame-based psychoacoustic optimized wavelet packet (WP) transform. Anthropomorphic adaptation of the time-frequency plan allows minimizing perceptual redundancy of the signal modelling. Psychoacoustic information at MP stage for the best atom selection from the dictionary is used. It improves algorithm performance in terms of human hearing system and computational complexity. Described signal model can be applied in many audio and speech processing tasks such as source separation, watermarking, classification and so on. Presented research focused on the signal encoding. Universal audio/speech coding algorithm that is suitable for the input signals with different sound content is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Sig. Process. 41(12), 3397–3415 (1993)
Article MATH Google Scholar
Chardon, G., Necciari, T., Balazs, P.: Perceptual matching pursuit with gabor dictionaries and time-frequency masking. In: ICASSP 2014, Florence, Italy, pp. 3126–3130 (2014)
Google Scholar
Ravelli, E., Richard, G., Daudet, L.: Matching pursuit in adaptive dictionaries for scalable audio coding. In: EUSIPCO 2008, Lausanne, Switzerland, pp. 1–5 (2008)
Google Scholar
Ruiz Reyes, N., Vera Candeas, P.: Adaptive signal modeling based on sparse approximations for scalable parametric audio coding. IEEE Trans. Audio Speech Lang. Process. 18(3), 447–460 (2010)
Article Google Scholar
Petrovsky, Al., Azarov, E., Petrovsky, A.: Hybrid signal decomposition based on instantaneous harmonic parameters and perceptually motivated wavelet packets for scalable audio coding. Sig. Process. 91, 1489–1504 (2011). Special Issue “Fourier Related Transforms for Non-Stationary Signals”. Elsevier
Article Google Scholar
Valin, J.-M., Maxwell, G., Terriberry, T., Vos, K.: High-quality, low-delay music coding in the opus codec. In: AES 135th Convention, paper 8942, New York, USA (2013)
Google Scholar
Vos, K., Sørensen, K.V., Jensen, S.S., Valin, J.-M.: Voice coding with opus. In: AES 135th Convention, paper 8941, New York, USA (2013)
Google Scholar
Goodwin, M., Vetterli, M.: Atomic decompositions of audio signals. In: IEEE Audio Signal Processing Workshop (1997)
Google Scholar
Petrovsky, A., Krahe, D., Petrovsky, A.A.: Real-time wavelet packet-based low bit rate audio coding on a dynamic reconfiguration system. In: AES 114th Convention, paper 5778, Amsterdam, The Netherlands (2003)
Google Scholar
Strang, G., Nguyen, T.: Wavelets and Filter Banks. Wellesley-Cambridge Press, Wellesley (1997)
MATH Google Scholar
Huber, R., Kollmeier, B.: PEMO-Q – a new method for objective audio quality assessment using a model of auditory perception. IEEE Trans. Audio Speech Lang. Process. 14(6), 1902–1911 (2006)
Article Google Scholar

Download references

Acknowledgement

This work was supported by ITForYou company.

Author information

Authors and Affiliations

Belarusian State University of Informatics and Radioelectronics, Minsk, Belarus
Alexey Petrovsky, Vadzim Herasimovich & Alexander Petrovsky

Authors

Alexey Petrovsky
View author publications
You can also search for this author in PubMed Google Scholar
Vadzim Herasimovich
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Petrovsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vadzim Herasimovich .

Editor information

Editors and Affiliations

SPIIRAS , Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University , Moscow, Russia
Rodmonga Potapova
Budapest University of Technology and Economics, Budapest, Hungary
Géza Németh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Petrovsky, A., Herasimovich, V., Petrovsky, A. (2016). Bio-Inspired Sparse Representation of Speech and Audio Using Psychoacoustic Adaptive Matching Pursuit. In: Ronzhin, A., Potapova, R., Németh, G. (eds) Speech and Computer. SPECOM 2016. Lecture Notes in Computer Science(), vol 9811. Springer, Cham. https://doi.org/10.1007/978-3-319-43958-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-43958-7_18
Published: 13 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43957-0
Online ISBN: 978-3-319-43958-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics