Skip to main content

Spoken Language Identification Using Language Bottleneck Features

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11697))

Included in the following conference series:

  • 787 Accesses

Abstract

In this paper, we introduce a novel approach for Language Identification (LID). Two commonly used state-of-the-art methods based on UBM/GMM I-vector technique, combined with a back-end classifier, are first evaluated. The differential factor between these two methods is the deployment of input features to train the UBM/GMM models: conventional MFCCs, or deep Bottleneck Features (BNF) extracted from a neural network. Analogous to successful algorithms developed for speaker recognition tasks, this paper proposes to train the BNF classifier directly on language targets rather than using conventional phone targets (i.e. international phone alphabet). We show that the proposed approach reduces the number of targets by 96% when tested on 4 languages of SpeechDat databases, which leads to 94% reduction in training time (i.e. to train BNF classifier). We achieve in average, relative improvement of approximately 35% in terms of cost average \(C_{avg}\), as well as Language Error Rates (LER), across all test duration conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://kaldi-asr.org/doc/.

  2. 2.

    https://www.phon.ucl.ac.uk/home/sampa/.

References

  1. Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  2. Elenius, K., Lindberg, J.: SpeechDat Speech Databases for Creation of Voice Driven Teleservices. Phonum 4, Phonetics, pp. 61–64 (1997). http://www.speech.kth.se/prod/publications/files/538.pdf

  3. Fér, R., Matějka, P., Grézl, F., Plchot, O., Cernocký, J.H.: Multilingual bottleneck features for language recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2015, pp. 389–393, January 2015

    Google Scholar 

  4. Glembek, O., Burget, L., Matějka, P., Karafiát, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519, May 2011

    Google Scholar 

  5. Gonzalez-Dominguez, J., Lopez-Moreno, I., Sak, H., Gonzalez-Rodriguez, J., Moreno, P.J.: Automatic language identification using long short-term memory recurrent neural networks. In: Proceedings of Interspeech, pp. 2155–2159 (2014)

    Google Scholar 

  6. Kenny, P.: Joint factor analysis of speaker and session variability: Theory and algorithms. Technical report (2005)

    Google Scholar 

  7. Kramer, M.A.: Nonlinear principal component analysis using auto-associative neural networks. AIChEJ 37(2), 233–243 (1991)

    Article  Google Scholar 

  8. Díez, M., Varona, A., Peñagarikano, M., Rodríguez-Fuentes, L.J., Bordel, G.: On the use of phone log-likelihood ratios as features in spoken language recognition. SLT, pp. 274–279 (2012)

    Google Scholar 

  9. Martinez, D., Plchot, O., Burget, L., Glembek, O., Matějka, P.: Language recognition in ivectors space. In: Twelfth Annual Conference of the International Speech Communication Association (2011)

    Google Scholar 

  10. Matejka, P., Cumani, S., Ondel, L., Mounika, K.V., Silnova, A., Rohdin, J.: BUT-PT System Description for NIST LRE 2017, 748097 (2017)

    Google Scholar 

  11. Matejka, P., et al.: Neural network bottleneck features for language identification. Odyssey, the Speaker and Language Recognition Workshop, pp. 299–304, June 2014

    Google Scholar 

  12. Povey, D., Chu, S.M., Varadarajan, B.: Universal background model based speech recognition. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4561–4564. IEEE (2008)

    Google Scholar 

  13. Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, December 2011, iEEE Catalog No.: CFP11SRW-USB

    Google Scholar 

  14. Snyder, D., Ghahremani, P., Povey, D., Garcia-Romero, D., Carmiel, Y., Khudanpur, S.: Deep neural network-based speaker embeddings for end-to-end speaker verification. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 165–170, December 2016

    Google Scholar 

  15. US department of commerce, N.: The 2007 NIST Language Recognition Evaluation Plan (LRE07). NIST Web document, pp. 1–5 (2007). https://catalog.ldc.upenn.edu/docs/LDC2009S04/LRE07EvalPlan-v8b-1.pdf

Download references

Acknowledgement

This work was partially supported by several industrial projects at Idiap and the China Scholarship Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingran Zhan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Grisard, M., Motlicek, P., Allouchi, W., Baeriswyl, M., Lazaridis, A., Zhan, Q. (2019). Spoken Language Identification Using Language Bottleneck Features. In: Ekštein, K. (eds) Text, Speech, and Dialogue. TSD 2019. Lecture Notes in Computer Science(), vol 11697. Springer, Cham. https://doi.org/10.1007/978-3-030-27947-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27947-9_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27946-2

  • Online ISBN: 978-3-030-27947-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics