Skip to main content
Log in

A voice command system for AUTONOMY using a novel speech alignment algorithm

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The Viterbi dynamic programming algorithm is currently the de-facto standard for speech recognizers to deal with duration variations of the sub-word units of speech by properly aligning the sub-word units to the sub-word unit models. The algorithm is an integral part of the hidden Markov model speech recognizers. In this work a robust and simple voice command system is developed, implemented and tested. It uses a novel speech alignment algorithm, the so-called “run-length limited dynamic programming algorithm” (RLL-DP) instead. The voice command system described hereinafter facilitates the operation of the AUTONOMY system, which is an environmental control system combined with an alternative and augmentative communication system, using isolated words as voice commands. The activation of “run-length limits” causes a statistically significant reduction of the word error rate, even when using simple “centroid sequence word models” instead of acoustic models based on “hidden control neural networks” used in previous versions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20, 30–42.

    Article  Google Scholar 

  • Do, V. H. (2011). Hybrid architectures for speech recognition. PhD Thesis, Nanyang, China: Nanyang Technological University.

  • Do, V. H., Xiao, X., & Chng, E. S. (2011). Comparison and combination of multilayer perceptrons and deep belief networks in hybrid automatic speech recognition systems. In Proceedings of the Asia-Pacific signal and information processing association annual summit and conference, APSIPA ASC, October 2011, Xi’an, China.

    Google Scholar 

  • Ferguson, J. D. (1980). Variable duration models for speech. In Symposium on the application of hidden Markov models to text and speech, October 1980 (pp. 143–179). Princeton: Institute for Defense Analyses.

    Google Scholar 

  • Forney, G. D. (1973). The viterbi algorithm. Proceedings of the IEEE, 61, 268–278.

    Article  MathSciNet  Google Scholar 

  • Fukunaga, K. (1990). Introduction to statistical pattern recognition. Boston: Academic Press.

    MATH  Google Scholar 

  • Gu, L., Harris, J. G., Shrivastav, R. S., & Sapienza, C. (2005). Disordered speech assessment using automatic methods based on quantitative measures. EURASIP Journal on Applied Signal Processing, 9, 1400–1409.

    Google Scholar 

  • Hawley, M. S., Enderby, P., Green, P., Cunningham, S., Brownsell, S., Carmichael, J., Parker, M., Hatzis, A., O’Neill, P., & Palmer, R. (2007). A speech-controlled environmental control system for people with severe dysarthria. Medical Engineering & Physics, 29, 586–593.

    Article  Google Scholar 

  • Hickersberger, H. (1998). Spracherkennung mit hidden control neural networks. E&I. Elektrotechnik und Informationstechnik, 115, 245–250.

    Google Scholar 

  • Hüsken, M., & Stagge, P. (2003). Recurrent neural networks for time series classification. Neurocomputing, 50, 223–235.

    Article  MATH  Google Scholar 

  • Iso, K., & Watanabe, T. (1990). Speaker-independent word recognition using a neural prediction model. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1990, Albuquerque, New Mexico, USA (pp. 441–444).

    Chapter  Google Scholar 

  • Jaitly, N., Nguyen, P., Senior, A., & Vanhoucke, V. (2012). Application of pretrained deep neural network to large vocabulary conversational speech recognition. In Proceedings of the 13th annual conference of the international speech communication association INTERSPEECH, September 2012. Portland: ISCA.

    Google Scholar 

  • Levin, E. (1993). Hidden control neural architecture modeling of nonlinear time varying systems and its applications. IEEE Transactions on Neural Networks, 4, 109–116.

    Article  Google Scholar 

  • Levinson, S. E. (1986). Continuously variable duration hidden Markov models for speech analysis. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1986, Tokyo, Japan (pp. 1241–1244).

    Google Scholar 

  • Loidolt, G. (1995). AUTONOM III: Spracherkennung. Diploma thesis, Vienna, Austria: Vienna University of Technology.

  • Ostendorf, M., Digalakis, V. V., & Kimbal, O. A. (1996). From HMMs to segment models: a unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech and Audio Processing, 4, 360–378.

    Article  Google Scholar 

  • Panek, P., Beck, C., Mina, S., Seisenbacher, G., & Zagler, W. L. (2002). Technical assistance of motor- and multiple disabled children—some long term experiences. In Lecture notes in computer science: Vol. 2398. Proceedings of the 8th international conference on computers helping people with special needs, ICCHP, July 2002, Linz, Austria (pp. 181–188).

    Chapter  Google Scholar 

  • Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.

    Article  Google Scholar 

  • Ramesh, P., & Wilpon, J. G. (1992). Modeling state durations in hidden Markov models for automatic speech recognition. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, March 1992, San Francisco, California, USA (pp. 381–384).

    Google Scholar 

  • Tebelskis, J., & Waibel, A. (1990). Large vocabulary recognition using linked predictive neural networks. In Proceedings of the international conference on acoustics, speech, and signal processing, ICASSP, April 1990, Albuquerque, New Mexico, USA (pp. 437–440).

    Chapter  Google Scholar 

  • Tschirk, W. (2001). Neural net speech recognizers—voice remote control devices for disabled people. E&I. Elektrotechnik und Informationstechnik, 118, 367–370.

    Google Scholar 

  • Vaseghi, S. V. (1991). Hidden Markov models with duration-dependent state transition probabilities. Electronics Letters, 27, 625–626.

    Article  Google Scholar 

  • Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328–339.

    Article  Google Scholar 

  • Widrow, B., & Lehr, M. (1973). 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proceedings of the IEEE, 78, 1415–1442.

    Article  Google Scholar 

  • Yaniv, R., & Burshtein, D. (2003). An enhanced dynamic time warping model for improved estimation of DTW parameters. IEEE Transactions on Speech and Audio Processing, 11, 216–228.

    Article  Google Scholar 

  • Yu, S.-Z. (2010). Hidden semi-Markov models. Artificial Intelligence, 174, 215–243.

    Article  MathSciNet  MATH  Google Scholar 

  • Zagler, W. L., Panek, P., & Flachberger, C. (1997). Technical assistance for severely motor- and multiple impaired children. In Proceedings of the 10th IEEE symposium on computer-based medical systems, June 1997, Maribor, Slovenia (pp. 232–237). Washington: IEEE Computer Society.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Helmut Hickersberger.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hickersberger, H., Zagler, W.L. A voice command system for AUTONOMY using a novel speech alignment algorithm. Int J Speech Technol 16, 461–469 (2013). https://doi.org/10.1007/s10772-013-9196-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-013-9196-2

Keywords

Navigation