Skip to main content

Proportional-Integral-Derivative Control of Automatic Speech Recognition Speed

  • Conference paper
Speech and Computer (SPECOM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

  • 1308 Accesses

Abstract

We propose a technique for regulating LVCSR decoding speed based on a proportional-integral-derivative (PID) model that is widely used in automatic control theory. Our experiments show that such a controller can maintain a given decoding speed level despite computer performance fluctuations, difficult acoustic conditions, or speech material that is out of the scope of the language model, without notable deterioration in overall recognition quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Steinbiss, V., Tran, B.-H., Ney, H.: Improvements in Beam Search. In: Proc. of the ICSLP, Yokohama, Japan, September 18-22, pp. 2143–2146 (1994)

    Google Scholar 

  2. Nolden, D., Schluter, R., Ney, H.: Extended search space pruning in LVCSR. In: Proc. of the ICASSP, Kyoto, Japan, March 25-30, pp. 4429–4432 (2012)

    Google Scholar 

  3. Hamme, H., Aellen, F.: An Adaptive-Beam Pruning Technique for Continuous Speech Recognition. In: Proc. of the ICSLP, Philadelphia, Pennsylvania, October 3-6, pp. 2083–2086 (1996)

    Google Scholar 

  4. Zhang, D., Du, L.: Dynamic Beam Pruning Strategy Using Adaptive Control. In: Proc. of the INTERSPEECH, Jeju Island, Korea, October 4-8, pp. 285–288 (2004)

    Google Scholar 

  5. Fabian, T., Lieb, R., Ruske, G., Thomae, M.: A Confidence-Guided Dynamic Pruning Approach-Utilization of Confidence Measurement in Speech Recognition. In: Proc. of the INTERSPEECH, Lisbon, Portugal, September 4-8, pp. 585–588 (2005)

    Google Scholar 

  6. Chan, A., Mosur, R., Rudnicky, A., Sherwani, J.: Four-layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems. In: Proc. of the ICSLP, Jeju Island, Korea, October 4-8, pp. 689–692 (2004)

    Google Scholar 

  7. Dixon, P., Oonishi, T., Furui, S.: Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition. Computer Speech & Language 23(4), 510–526 (2009)

    Article  Google Scholar 

  8. Lei, X., Senior, A., Gruenstein, A., Sorensen, J.: Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices. In: Proc. of the INTERSPEECH, Lyon, France, August 25-29, pp. 662–665 (2013)

    Google Scholar 

  9. Ang, K., Chong, G., Li, Y.: PID control system analysis, design, and technology. IEEE Transactions on Control Systems Technology 13(4), 559–576 (2005)

    Article  Google Scholar 

  10. Young, S., Russell, N., Thornton, J.: Token Passing: a Conceptual Model for Connected Speech Recognition Systems. CUED Technical Report F INFENG/TR38. Cambridge University, Cambridge (1989)

    Google Scholar 

  11. Saon, G., Povey, D., Zweig, G.: Anatomy of an extremely fast LVCSR decoder. In: Proc. of the INTERSPEECH, Lisbon, Portugal, September 4-8, pp. 549–552 (2005)

    Google Scholar 

  12. Li, Y., Ang, K., Chong, G.: Patents, software and hardware for PID control: an overview and analysis of the current art. IEEE Control Systems Magazine 26(1), 42–54 (2006)

    Article  Google Scholar 

  13. Dixon, P., Caseiro, D., Oonishi, T., Furui, S.: The Titech large vocabulary WFST speech recognition system. In: Proc. of the ASRU, Kyoto, Japan, December 9-13, pp. 443–448 (2007)

    Google Scholar 

  14. Novak, J., Minematsu, N., Hirose, K.: Open Source WFST Tools for LVCSR Cascade Development. In: Proc. of the FSMNLP, Bois, France, July 12-16, pp. 65–73 (2011)

    Google Scholar 

  15. Allauzen, C., Mohri, M., Riley, M., Roark, B.: A Generalized Construction of Integrated Speech Recognition Transducers. In: Proc. of the ICASSP, Montreal, Canada, May 17-21, vol. 1, pp. 761–764 (2004)

    Google Scholar 

  16. Mohri, M., Pereira, F., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)

    Article  Google Scholar 

  17. Schwarz, P.: Phoneme recognition based on long temporal context (PhD thesis). Faculty of Information Technology BUT, Brno (2008)

    Google Scholar 

  18. Yurkov, P., Korenevsky, M., Levin, K.: An Improvement of robustness to speech loudness change for an ASR system based on LC-RC features. In: Proc. of the SPECOM, Kazan, Russia, September 27-30, pp. 62–66 (2011)

    Google Scholar 

  19. Tomashenko, N.A., Khokhlov, Y.Y.: Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 146–153. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Zatvornitsky, A., Romanenko, A., Korenevsky, M. (2014). Proportional-Integral-Derivative Control of Automatic Speech Recognition Speed. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11581-8_45

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11580-1

  • Online ISBN: 978-3-319-11581-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics