Skip to main content

Improving Humanoid Robot Speech Recognition with Sound Source Localisation

  • Conference paper
Artificial Neural Networks and Machine Learning – ICANN 2014 (ICANN 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8681))

Included in the following conference series:

Abstract

In this paper we propose an embodied approach to automatic speech recognition, where a humanoid robot adjusts its orientation to the angle that increases the signal-to-noise ratio of speech. In other words, the robot turns its face to ’hear’ the speaker better, similar to what people with auditory deficiencies do. The robot tracks a speaker with a binaural sound source localisation system (SSL) that uses spiking neural networks to model relevant areas in the mammalian auditory pathway for SSL. The accuracy of speech recognition is doubled when the robot orients towards the speaker in an optimal angle and listens only through one ear instead of averaging the input from both ears.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asano, F., Goto, M., Itou, K., Asoh, H.: Real-time sound source localization and separation system and its application to automatic speech recognition. In: INTERSPEECH, pp. 1013–1016 (2001)

    Google Scholar 

  2. Bauer, J., Davila-Chacon, J., Strahl, E., Wermter, S.: Smoke and mirrors — Virtual realities for sensor fusion experiments in biomimetic robotics. In: Intl. Conf. on Multisensor Fusion and Integration, MFI, pp. 114–119. IEEE (2012)

    Google Scholar 

  3. Beira, R., Lopes, M., Praga, M., Santos-Victor, J., Bernardino, A., Metta, G., Becchi, F., Saltarén, R.: Design of the robot-cub (iCub) head. In: Intl. Conf. on Robotics and Automation, ICRA, pp. 94–100. IEEE (2006)

    Google Scholar 

  4. Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication 50(5), 434–451 (2008)

    Article  Google Scholar 

  5. Cong-qing, L., Fang, W., Shi-jie, D., Li-xin, S., He, H., Li-ying, S.: A novel method of binaural sound localization based on dominant frequency separation. In: Intl. Cong. on Image and Signal Processing, CISP, pp. 1–4. IEEE (2009)

    Google Scholar 

  6. Davila-Chacon, J., Heinrich, S., Liu, J., Wermter, S.: Biomimetic binaural sound source localisation with ego-noise cancellation. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part I. LNCS, vol. 7552, pp. 239–246. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Davila-Chacon, J., Magg, S., Liu, J., Wermter, S.: Neural and statistical processing of spatial cues for sound source localisation. In: Intl. Joint Conf. on Neural Networks, IJCNN. IEEE (2013)

    Google Scholar 

  8. Deleforge, A., Horaud, R.: The cocktail party robot: Sound source separation and localisation with an active binaural head. In: Proceedings of the International Conference on Human-Robot Interaction, pp. 431–438. ACM/IEEE (2012)

    Google Scholar 

  9. Fréchette, M., Létourneau, D., Valin, J., Michaud, F.: Integration of sound source localization and separation to improve dialogue management on a robot. In: Intl. Conf. on Intelligent Robots and Systems, IROS, pp. 2358–2363. IEEE (2012)

    Google Scholar 

  10. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: Darpa timit acoustic-phonetic continuous speech corpus cd-rom. nist speech disc 1-1.1. NASA STI/Recon Technical Report N 93, 27403 (1993)

    Google Scholar 

  11. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. In: Soviet Physics Doklady, vol. 10, pp. 707–710 (1966)

    Google Scholar 

  12. Liu, J., Perez-Gonzalez, D., Rees, A., Erwin, H., Wermter, S.: A biologically inspired spiking neural network model of the auditory midbrain for sound source localisation. Neurocomputing 74(1-3), 129–139 (2010)

    Article  Google Scholar 

  13. Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., Kamvar, M., Strope, B.: Your word is my command: Google search by voice: A case study. In: Advances in Speech Recognition, pp. 61–90. Springer (2010)

    Google Scholar 

  14. Schnupp, J., Nelken, I., King, A.: Auditory neuroscience: Making sense of sound. The MIT Press (2011)

    Google Scholar 

  15. Slaney, M.: An efficient implementation of the Patterson-Holdsworth auditory filter bank. Tech. rep. Apple Computer, Perception Group (1993)

    Google Scholar 

  16. Zion-Golumbic, E., Schroeder, C.E.: Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron 77(5), 980–991 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Dávila-Chacón, J., Twiefel, J., Liu, J., Wermter, S. (2014). Improving Humanoid Robot Speech Recognition with Sound Source Localisation. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_78

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11179-7_78

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11178-0

  • Online ISBN: 978-3-319-11179-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics