Skip to main content

Speaking Rate Estimation Based on Deep Neural Networks

  • Conference paper
Speech and Computer (SPECOM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

  • 1323 Accesses

Abstract

In this paper we propose a method for estimating speaking rate by means of Deep Neural Networks (DNN). The proposed approach is used for speaking rate adaptation of an automatic speech recognition system. The adaptation is performed by changing step in front-end feature processing according to the estimations of speaking rate. Experiments show that adaptation results using the proposed DNN-based speaking rate estimator are better than the results of adaptation using the speaking rate estimator based on the recognition results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mirghafori, N., Fosler, E., Morgan, N.: Towards robustness to fast speech in ASR. In: Proc. of the IEEE International Conference in Acoustics, Speech, and Signal Processing, ICASSP 1996, pp. 335–338 (1996)

    Google Scholar 

  2. Morgan, N., Fosler-Lussier, E.: Combining multiple estimators of speaking rate. In: Proc. of the IEEE International Conference In Acoustics, Speech, and Signal Processing, ICASSP-1996, pp. 729–732 (1998)

    Google Scholar 

  3. Faltlhauser, R., Pfau, T., Ruske, G.: On-line speaking rate estimation using gaussian mixture models. In: Proc. of the IEEE International Conference In Acoustics, Speech, and Signal Processing, ICASSP 2000, pp. 1355–1358 (2000)

    Google Scholar 

  4. Pfau, T., Ruske, G.: Estimating the speaking rate by vowel detection. In: Proc. of the IEEE International Conference In Acoustics, Speech and Signal Processing, ICASSP 1998, pp. 945–948 (1998)

    Google Scholar 

  5. Mirghafori, N., Foster, E., Morgan, N.: Fast speakers in large vocabulary continuous speech recognition: analysis & antidotes. In: Proc. of the EUROSPEECH, pp. 491–494 (1995)

    Google Scholar 

  6. Siegler, M.A.: Measuring and compensating for the effects of speech rate in large vocabulary continuous speech recognition (PhD Thesis). Carnegie Mellon University, Pittsburgh (1995)

    Google Scholar 

  7. Wrede, B., Fink, G.A., Sagerer, G.: An investigation of modelling aspects for rate-dependent speech recognition. In: Proc. of the INTERSPEECH, pp. 2527–2530 (2001)

    Google Scholar 

  8. Ban, S.M., Kim, H.S.: Speaking rate dependent multiple acoustic models using continuous frame rate normalization. In: Proc. of the Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), Asia-Pacific, pp. 1–4 (2012)

    Google Scholar 

  9. Nanjo, H., Kato, K., Kawahara, T.: Speaking rate dependent acoustic modeling for spontaneous lecture speech recognition. In: Proc. of the INTERSPEECH, pp. 2531–2534 (2001)

    Google Scholar 

  10. Chu, S.M., Povey, D.: Speaking rate adaptation using continuous frame rate normalization. In: Proc. of the IEEE International Conference in Acoustics Speech and Signal Processing (ICASSP), pp. 4306–4309 (2010)

    Google Scholar 

  11. Zhu, Q., Alwan, A.: On the use of variable frame rate analysis in speech recognition. In: Proc. of the 2000 IEEE International Conference in Acoustics Speech and Signal Processing (ICASSP 2000), pp. 1783–1786 (2000)

    Google Scholar 

  12. Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., ... Wellekens, C. Automatic speech recognition and speech variability: A review. Speech Communication 49(10), 763–786 (2007)

    Google Scholar 

  13. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., ... Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine 29(6), 82–97 (2012)

    Google Scholar 

  14. You, H., Zhu, Q., Alwan, A.: Entropy-based variable frame rate analysis of speech signals and its application to ASR. In: Proc. of the IEEE International Conference on In Acoustics, Speech, and Signal Processing – ICASSP 2004, vol. 1, pp. 549–552 (May 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tomashenko, N., Khokhlov, Y. (2014). Speaking Rate Estimation Based on Deep Neural Networks. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11581-8_52

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11580-1

  • Online ISBN: 978-3-319-11581-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics