A voice command system for AUTONOMY using a novel speech alignment algorithm

Hickersberger, Helmut; Zagler, Wolfgang L.

doi:10.1007/s10772-013-9196-2

A voice command system for AUTONOMY using a novel speech alignment algorithm

Published: 25 April 2013

Volume 16, pages 461–469, (2013)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Helmut Hickersberger¹ &
Wolfgang L. Zagler¹

201 Accesses
1 Citation
Explore all metrics

Abstract

The Viterbi dynamic programming algorithm is currently the de-facto standard for speech recognizers to deal with duration variations of the sub-word units of speech by properly aligning the sub-word units to the sub-word unit models. The algorithm is an integral part of the hidden Markov model speech recognizers. In this work a robust and simple voice command system is developed, implemented and tested. It uses a novel speech alignment algorithm, the so-called “run-length limited dynamic programming algorithm” (RLL-DP) instead. The voice command system described hereinafter facilitates the operation of the AUTONOMY system, which is an environmental control system combined with an alternative and augmentative communication system, using isolated words as voice commands. The activation of “run-length limits” causes a statistically significant reduction of the word error rate, even when using simple “centroid sequence word models” instead of acoustic models based on “hidden control neural networks” used in previous versions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20, 30–42.
Article Google Scholar
Do, V. H. (2011). Hybrid architectures for speech recognition. PhD Thesis, Nanyang, China: Nanyang Technological University.
Do, V. H., Xiao, X., & Chng, E. S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. In Proceedings of the Asia-Pacific signal and information processing association annual summit and conference, APSIPA ASC, October 2011, Xi’an, China.
Google Scholar
Ferguson, J. D. (1980). Variable duration models for speech. In Symposium on the application of hidden Markov models to text and speech, October 1980 (pp. 143–179). Princeton: Institute for Defense Analyses.
Google Scholar
Forney, G. D. (1973). The viterbi algorithm. Proceedings of the IEEE, 61, 268–278.
Article MathSciNet Google Scholar
Fukunaga, K. (1990). Introduction to statistical pattern recognition. Boston: Academic Press.
MATH Google Scholar
Gu, L., Harris, J. G., Shrivastav, R. S., & Sapienza, C. (2005). Disordered speech assessment using automatic methods based on quantitative measures. EURASIP Journal on Applied Signal Processing, 9, 1400–1409.
Google Scholar
Hawley, M. S., Enderby, P., Green, P., Cunningham, S., Brownsell, S., Carmichael, J., Parker, M., Hatzis, A., O’Neill, P., & Palmer, R. (2007). A speech-controlled environmental control system for people with severe dysarthria. Medical Engineering & Physics, 29, 586–593.
Article Google Scholar
Hickersberger, H. (1998). Spracherkennung mit hidden control neural networks. E&I. Elektrotechnik und Informationstechnik, 115, 245–250.
Google Scholar
Hüsken, M., & Stagge, P. (2003). Recurrent neural networks for time series classification. Neurocomputing, 50, 223–235.
Article MATH Google Scholar
Iso, K., & Watanabe, T. (1990). Speaker-independent word recognition using a neural prediction model. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1990, Albuquerque, New Mexico, USA (pp. 441–444).
Chapter Google Scholar
Jaitly, N., Nguyen, P., Senior, A., & Vanhoucke, V. (2012). Application of pretrained deep neural network to large vocabulary conversational speech recognition. In Proceedings of the 13th annual conference of the international speech communication association INTERSPEECH, September 2012. Portland: ISCA.
Google Scholar
Levin, E. (1993). Hidden control neural architecture modeling of nonlinear time varying systems and its applications. IEEE Transactions on Neural Networks, 4, 109–116.
Article Google Scholar
Levinson, S. E. (1986). Continuously variable duration hidden Markov models for speech analysis. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1986, Tokyo, Japan (pp. 1241–1244).
Google Scholar
Loidolt, G. (1995). AUTONOM III: Spracherkennung. Diploma thesis, Vienna, Austria: Vienna University of Technology.
Ostendorf, M., Digalakis, V. V., & Kimbal, O. A. (1996). From HMMs to segment models: a unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech and Audio Processing, 4, 360–378.
Article Google Scholar
Panek, P., Beck, C., Mina, S., Seisenbacher, G., & Zagler, W. L. (2002). Technical assistance of motor- and multiple disabled children—some long term experiences. In Lecture notes in computer science: Vol. 2398. Proceedings of the 8th international conference on computers helping people with special needs, ICCHP, July 2002, Linz, Austria (pp. 181–188).
Chapter Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.
Article Google Scholar
Ramesh, P., & Wilpon, J. G. (1992). Modeling state durations in hidden Markov models for automatic speech recognition. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, March 1992, San Francisco, California, USA (pp. 381–384).
Google Scholar
Tebelskis, J., & Waibel, A. (1990). Large vocabulary recognition using linked predictive neural networks. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1990, Albuquerque, New Mexico, USA (pp. 437–440).
Chapter Google Scholar
Tschirk, W. (2001). Neural net speech recognizers—voice remote control devices for disabled people. E&I. Elektrotechnik und Informationstechnik, 118, 367–370.
Google Scholar
Vaseghi, S. V. (1991). Hidden Markov models with duration-dependent state transition probabilities. Electronics Letters, 27, 625–626.
Article Google Scholar
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328–339.
Article Google Scholar
Widrow, B., & Lehr, M. (1973). 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proceedings of the IEEE, 78, 1415–1442.
Article Google Scholar
Yaniv, R., & Burshtein, D. (2003). An enhanced dynamic time warping model for improved estimation of DTW parameters. IEEE Transactions on Speech and Audio Processing, 11, 216–228.
Article Google Scholar
Yu, S.-Z. (2010). Hidden semi-Markov models. Artificial Intelligence, 174, 215–243.
Article MathSciNet MATH Google Scholar
Zagler, W. L., Panek, P., & Flachberger, C. (1997). Technical assistance for severely motor- and multiple impaired children. In Proceedings of the 10th IEEE symposium on computer-based medical systems, June 1997, Maribor, Slovenia (pp. 232–237). Washington: IEEE Computer Society.
Google Scholar

Download references

Author information

Authors and Affiliations

Human Computer Interaction (HCI) Group, Institute of Design & Assessment of Technology, Vienna University of Technology, 1040, Vienna, Favoritenstraße 11, Austria
Helmut Hickersberger & Wolfgang L. Zagler

Authors

Helmut Hickersberger
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang L. Zagler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Helmut Hickersberger.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hickersberger, H., Zagler, W.L. A voice command system for AUTONOMY using a novel speech alignment algorithm. Int J Speech Technol 16, 461–469 (2013). https://doi.org/10.1007/s10772-013-9196-2

Download citation

Received: 26 November 2012
Accepted: 12 March 2013
Published: 25 April 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10772-013-9196-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A voice command system for AUTONOMY using a novel speech alignment algorithm

Abstract

Access this article

Similar content being viewed by others

Developing of a Software–Hardware Complex for Automatic Audio–Visual Speech Recognition in Human–Robot Interfaces

Developing a Voice Control System for a Wheeled Robot

A Robust Control of Intelligent Mobile Robot Based on Voice Command

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A voice command system for AUTONOMY using a novel speech alignment algorithm

Abstract

Access this article

Similar content being viewed by others

Developing of a Software–Hardware Complex for Automatic Audio–Visual Speech Recognition in Human–Robot Interfaces

Developing a Voice Control System for a Wheeled Robot

A Robust Control of Intelligent Mobile Robot Based on Voice Command

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation