Proportional-Integral-Derivative Control of Automatic Speech Recognition Speed

Zatvornitsky, Alexander; Romanenko, Aleksei; Korenevsky, Maxim

doi:10.1007/978-3-319-11581-8_45

Alexander Zatvornitsky²²,
Aleksei Romanenko²³ &
Maxim Korenevsky²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

International Conference on Speech and Computer

1308 Accesses

Abstract

We propose a technique for regulating LVCSR decoding speed based on a proportional-integral-derivative (PID) model that is widely used in automatic control theory. Our experiments show that such a controller can maintain a given decoding speed level despite computer performance fluctuations, difficult acoustic conditions, or speech material that is out of the scope of the language model, without notable deterioration in overall recognition quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Steinbiss, V., Tran, B.-H., Ney, H.: Improvements in Beam Search. In: Proc. of the ICSLP, Yokohama, Japan, September 18-22, pp. 2143–2146 (1994)
Google Scholar
Nolden, D., Schluter, R., Ney, H.: Extended search space pruning in LVCSR. In: Proc. of the ICASSP, Kyoto, Japan, March 25-30, pp. 4429–4432 (2012)
Google Scholar
Hamme, H., Aellen, F.: An Adaptive-Beam Pruning Technique for Continuous Speech Recognition. In: Proc. of the ICSLP, Philadelphia, Pennsylvania, October 3-6, pp. 2083–2086 (1996)
Google Scholar
Zhang, D., Du, L.: Dynamic Beam Pruning Strategy Using Adaptive Control. In: Proc. of the INTERSPEECH, Jeju Island, Korea, October 4-8, pp. 285–288 (2004)
Google Scholar
Fabian, T., Lieb, R., Ruske, G., Thomae, M.: A Confidence-Guided Dynamic Pruning Approach-Utilization of Confidence Measurement in Speech Recognition. In: Proc. of the INTERSPEECH, Lisbon, Portugal, September 4-8, pp. 585–588 (2005)
Google Scholar
Chan, A., Mosur, R., Rudnicky, A., Sherwani, J.: Four-layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems. In: Proc. of the ICSLP, Jeju Island, Korea, October 4-8, pp. 689–692 (2004)
Google Scholar
Dixon, P., Oonishi, T., Furui, S.: Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition. Computer Speech & Language 23(4), 510–526 (2009)
Article Google Scholar
Lei, X., Senior, A., Gruenstein, A., Sorensen, J.: Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices. In: Proc. of the INTERSPEECH, Lyon, France, August 25-29, pp. 662–665 (2013)
Google Scholar
Ang, K., Chong, G., Li, Y.: PID control system analysis, design, and technology. IEEE Transactions on Control Systems Technology 13(4), 559–576 (2005)
Article Google Scholar
Young, S., Russell, N., Thornton, J.: Token Passing: a Conceptual Model for Connected Speech Recognition Systems. CUED Technical Report F INFENG/TR38. Cambridge University, Cambridge (1989)
Google Scholar
Saon, G., Povey, D., Zweig, G.: Anatomy of an extremely fast LVCSR decoder. In: Proc. of the INTERSPEECH, Lisbon, Portugal, September 4-8, pp. 549–552 (2005)
Google Scholar
Li, Y., Ang, K., Chong, G.: Patents, software and hardware for PID control: an overview and analysis of the current art. IEEE Control Systems Magazine 26(1), 42–54 (2006)
Article Google Scholar
Dixon, P., Caseiro, D., Oonishi, T., Furui, S.: The Titech large vocabulary WFST speech recognition system. In: Proc. of the ASRU, Kyoto, Japan, December 9-13, pp. 443–448 (2007)
Google Scholar
Novak, J., Minematsu, N., Hirose, K.: Open Source WFST Tools for LVCSR Cascade Development. In: Proc. of the FSMNLP, Bois, France, July 12-16, pp. 65–73 (2011)
Google Scholar
Allauzen, C., Mohri, M., Riley, M., Roark, B.: A Generalized Construction of Integrated Speech Recognition Transducers. In: Proc. of the ICASSP, Montreal, Canada, May 17-21, vol. 1, pp. 761–764 (2004)
Google Scholar
Mohri, M., Pereira, F., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)
Article Google Scholar
Schwarz, P.: Phoneme recognition based on long temporal context (PhD thesis). Faculty of Information Technology BUT, Brno (2008)
Google Scholar
Yurkov, P., Korenevsky, M., Levin, K.: An Improvement of robustness to speech loudness change for an ASR system based on LC-RC features. In: Proc. of the SPECOM, Kazan, Russia, September 27-30, pp. 62–66 (2011)
Google Scholar
Tomashenko, N.A., Khokhlov, Y.Y.: Fast Algorithm for Automatic Alignment of Speech and Imperfect Text Data. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 146–153. Springer, Heidelberg (2013)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Speech Technology Center, Saint-Petersburg, Russia
Alexander Zatvornitsky & Maxim Korenevsky
ITMO University, Saint-Petersburg, Russia
Aleksei Romanenko

Authors

Alexander Zatvornitsky
View author publications
You can also search for this author in PubMed Google Scholar
Aleksei Romanenko
View author publications
You can also search for this author in PubMed Google Scholar
Maxim Korenevsky
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation of the Russian Academy of Sciences, 39, 14th line, 199178, St. Petersburg, Russia
Andrey Ronzhin
Institute of Applied and Mathematical Linguistics, Moscow State Linguistic University, 38, Ostozhenka, 119034, Moscow, Russia
Rodmonga Potapova
Faculty of Technical Sciences, University of Novi Sad, 6, Trg Dositeja Obradovića, 21000, Novi Sad, Serbia
Vlado Delic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zatvornitsky, A., Romanenko, A., Korenevsky, M. (2014). Proportional-Integral-Derivative Control of Automatic Speech Recognition Speed. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-11581-8_45
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics