Warped and Warped-Twice MVDR Spectral Estimation With and Without Filterbanks

Wölfel, Matthias

doi:10.1007/11965152_24

Matthias Wölfel¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

738 Accesses

Abstract

This paper describes a novel extension to warped minimum variance distortionless response (MVDR) spectral estimation which allows to steer the resolution of the spectral envelope estimation to lower or higher frequencies while keeping the overall resolution of the estimate and the frequency axis fixed. This effect can be achieved by the introduction of a second bilinear transformation to the warped MVDR spectral estimation, but now in the frequency domain as opposed to the first bilinear transformation which is applied in the time domain, and a compensation step to adjust for the pre-emphasis of both bilinear transformations. In the feature extraction process of an automatic speech recognition system this novel extension allows to emphasize classification relevant characteristics while dropping classification irrelevant characteristics of speech features according to the characteristics of the signal to analyze.

We have compared the novel extension to warped MVDR and the traditional Mel frequency cepstral coefficients (MFCC) on development and evaluation data of the Rich Transcription 2005 Spring Meeting Recognition Evaluation lecture meeting task. The results are promising and we are going to use the described warped and warped-twice front-end settings in the upcoming NIST evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Malayath, N.: Data-driven methods for extracting features from speech. Ph.D. dissertation, Oregon Graduate Institute of Science and Technology (January 2000)
Google Scholar
Wölfel, M., McDonough, J.: Minimum variance distortionless response spectral estimation, review and refinements. IEEE Signal Processing Magazine 22(5), 117–126 (2005)
Article Google Scholar
Murthi, M., Rao, B.: All-pole model parameter estimation for voiced speech. In: IEEE Workshop Speech Coding Telecommunications Proc., Pacono Manor, PA (1997)
Google Scholar
Murthi, M., Rao, B.: All-pole modeling of speech based on the minimum variance distortionless response spectrum. IEEE Trans. Speech Audio Processing 8(3), 221–239 (2000)
Article Google Scholar
Dharanipragada, S., Rao, B.: MVDR based feature extraction for robust speech recognition. In: Proc. ICASSP, vol. 1, pp. 309–312 (2001)
Google Scholar
Wölfel, M., McDonough, J., Waibel, A.: Minimum variance distortionless response on a warped frequency scale. In: Proc. Eurospeech, pp. 1021–1024 (2003)
Google Scholar
Nakatoh, Y., Nishizaki, M., Yoshizawa, S., Yamada, M.: An adaptive Mel-LP analysis for speech recognition. In: Proc. ICSLP (2004)
Google Scholar
Musicus, B.: Fast MLM power spectrum estimation from uniformly spaced correlations. IEEE Trans. Acoustics, Speech, Signal Processing 33, 1333–1335 (1985)
Article Google Scholar
Matsumoto, H., Moroto, M.: Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition. In: Proc. ICASSP, vol. 1, pp. 117–120 (2001)
Google Scholar
Oppenheim, A.V., Schafer, R.W.: Discrete-time signal processing. Prentice-Hall Inc., Englewood Cliffs (1989)
MATH Google Scholar
National Institute of Standards and Technology (NIST), Rich transcription 2005 spring meeting recognition evaluation (June 2005), http://www.nist.gov/speech/tests/rt/rt2005/spring
Linguistic Data Consortium (LDC), Translanguage english database, LDC2002S04
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Theoretische Informatik, Universität Karlsruhe (TH), Am Fasanengarten 5, 76131, Karlsruhe, Germany
Matthias Wölfel

Authors

Matthias Wölfel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wölfel, M. (2006). Warped and Warped-Twice MVDR Spectral Estimation With and Without Filterbanks. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_24

Download citation

DOI: https://doi.org/10.1007/11965152_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics