Codebook Design for Speech Guided Car Infotainment Systems

Raab, Martin; Gruhn, Rainer; Noeth, Elmar

doi:10.1007/978-3-540-69369-7_6

Martin Raab^1,2,
Rainer Gruhn^1,3 &
Elmar Noeth²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5078))

Included in the following conference series:

International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems

1379 Accesses

Abstract

In car infotainment systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Previous approaches did not address the constraint of conserving the main language performance when they extended their systems to cover multilingual input.

In this paper we present an approach for speech recognition of multiple languages with constrained resources on embedded devices. Speech recognizers on such systems are typically to-date semi-continuous speech recognizers, which are based on vector quantization.

We provide evidence that common vector quantization algorithms are not optimal for such systems when they have to cope with input from multiple languages. Our new method combines information from multiple languages and creates a new codebook that can be used for efficient vector quantization in multilingual scenarios. Experiments show significant improved speech recognition results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fischer, V., Janke, E., Kunzmann, S.: Recent progress in the decoding of non-native speech with multilingual acoustic models. In: Proc. Eurospeech, pp. 3105–3108 (2003)
Google Scholar
Fuegen, C.: Efficient handling of multilingual language models. In: Proc. ASRU, pp. 441–446 (2003)
Google Scholar
Gruhn, R., Markov, K., Nakamura, S.: A statistical lexicon for non-native speech recognition. In: Proc. Interspeech, Jeju Island, Korea, pp. 1497–1500 (2004)
Google Scholar
Iskra, D., Grosskopf, B., Marasek, K., van den Huevel, H., Diehl, F., Kiessling, A.: Speecon - speech databases for consumer devices: Database specification and validation. In: Proc. LREC (2002)
Google Scholar
Koehler, J.: Multilingual phone models for vocabulary-independent speech recognition tasks. Speech Communication Journal 35(1-2), 21–30 (2001)
Article MATH Google Scholar
Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantization design. IEEE Transactions on Communications 28(1), 84–95 (1980)
Article Google Scholar
Raab, M.: Language Modeling for Machine Translation. Vdm Verlag, Saarbruecken (2007)
Google Scholar
Raab, M., Gruhn, R., Noeth, E.: Non-native speech databases. In: Proc. ASRU, Kyoto, Japan, pp. 413–418 (2007)
Google Scholar
Raab, M., Gruhn, R., Noeth, E.: Multilingual weighted codebooks. In: Proc. ICASSP, Las Vegas, USA (2008)
Google Scholar
Schultz, T., Waibel, A.: Language-independent and language-adaptive acoustic modeling for speech recognition. Speech Communication 35, 31–51 (2001)
Article MATH Google Scholar
Segura, J., et al.: The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication (2007), http://www.hiwire.org/
Steidl, S.: Interpolation von Hidden Markov Modellen. Master’s thesis, University Erlangen-Nuremberg (2002)
Google Scholar
Tomokiyo, L.: Recognizing Non-native Speech: Characterizing and Adapting to Non-native Usage in Speech Recognition. PhD thesis, Carnegie Mellon University, Pennsylvania (2001)
Google Scholar
Witt, S.: Use of Speech Recognition in Computer-Assisted Language Learning. PhD thesis, Cambridge University Engineering Department, UK (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Harman Becker Automotive Systems, Speech Dialog Systems, Ulm, Germany
Martin Raab & Rainer Gruhn
Dept. of Pattern Recognition, University of Erlangen, Erlangen, Germany
Martin Raab & Elmar Noeth
Dept. of Information Technology, University of Ulm, Ulm, Germany
Rainer Gruhn

Authors

Martin Raab
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Gruhn
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Noeth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Elisabeth André Laila Dybkjær Wolfgang Minker Heiko Neumann Roberto Pieraccini Michael Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raab, M., Gruhn, R., Noeth, E. (2008). Codebook Design for Speech Guided Car Infotainment Systems. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds) Perception in Multimodal Dialogue Systems. PIT 2008. Lecture Notes in Computer Science(), vol 5078. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69369-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-69369-7_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69368-0
Online ISBN: 978-3-540-69369-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics