Improving Humanoid Robot Speech Recognition with Sound Source Localisation

Dávila-Chacón, Jorge; Twiefel, Johannes; Liu, Jindong; Wermter, Stefan

doi:10.1007/978-3-319-11179-7_78

Jorge Dávila-Chacón²¹,
Johannes Twiefel²¹,
Jindong Liu²² &
…
Stefan Wermter²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8681))

Included in the following conference series:

International Conference on Artificial Neural Networks

4294 Accesses
8 Altmetric

Abstract

In this paper we propose an embodied approach to automatic speech recognition, where a humanoid robot adjusts its orientation to the angle that increases the signal-to-noise ratio of speech. In other words, the robot turns its face to ’hear’ the speaker better, similar to what people with auditory deficiencies do. The robot tracks a speaker with a binaural sound source localisation system (SSL) that uses spiking neural networks to model relevant areas in the mammalian auditory pathway for SSL. The accuracy of speech recognition is doubled when the robot orients towards the speaker in an optimal angle and listens only through one ear instead of averaging the input from both ears.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asano, F., Goto, M., Itou, K., Asoh, H.: Real-time sound source localization and separation system and its application to automatic speech recognition. In: INTERSPEECH, pp. 1013–1016 (2001)
Google Scholar
Bauer, J., Davila-Chacon, J., Strahl, E., Wermter, S.: Smoke and mirrors — Virtual realities for sensor fusion experiments in biomimetic robotics. In: Intl. Conf. on Multisensor Fusion and Integration, MFI, pp. 114–119. IEEE (2012)
Google Scholar
Beira, R., Lopes, M., Praga, M., Santos-Victor, J., Bernardino, A., Metta, G., Becchi, F., Saltarén, R.: Design of the robot-cub (iCub) head. In: Intl. Conf. on Robotics and Automation, ICRA, pp. 94–100. IEEE (2006)
Google Scholar
Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication 50(5), 434–451 (2008)
Article Google Scholar
Cong-qing, L., Fang, W., Shi-jie, D., Li-xin, S., He, H., Li-ying, S.: A novel method of binaural sound localization based on dominant frequency separation. In: Intl. Cong. on Image and Signal Processing, CISP, pp. 1–4. IEEE (2009)
Google Scholar
Davila-Chacon, J., Heinrich, S., Liu, J., Wermter, S.: Biomimetic binaural sound source localisation with ego-noise cancellation. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part I. LNCS, vol. 7552, pp. 239–246. Springer, Heidelberg (2012)
Chapter Google Scholar
Davila-Chacon, J., Magg, S., Liu, J., Wermter, S.: Neural and statistical processing of spatial cues for sound source localisation. In: Intl. Joint Conf. on Neural Networks, IJCNN. IEEE (2013)
Google Scholar
Deleforge, A., Horaud, R.: The cocktail party robot: Sound source separation and localisation with an active binaural head. In: Proceedings of the International Conference on Human-Robot Interaction, pp. 431–438. ACM/IEEE (2012)
Google Scholar
Fréchette, M., Létourneau, D., Valin, J., Michaud, F.: Integration of sound source localization and separation to improve dialogue management on a robot. In: Intl. Conf. on Intelligent Robots and Systems, IROS, pp. 2358–2363. IEEE (2012)
Google Scholar
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: Darpa timit acoustic-phonetic continuous speech corpus cd-rom. nist speech disc 1-1.1. NASA STI/Recon Technical Report N 93, 27403 (1993)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. In: Soviet Physics Doklady, vol. 10, pp. 707–710 (1966)
Google Scholar
Liu, J., Perez-Gonzalez, D., Rees, A., Erwin, H., Wermter, S.: A biologically inspired spiking neural network model of the auditory midbrain for sound source localisation. Neurocomputing 74(1-3), 129–139 (2010)
Article Google Scholar
Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., Kamvar, M., Strope, B.: Your word is my command: Google search by voice: A case study. In: Advances in Speech Recognition, pp. 61–90. Springer (2010)
Google Scholar
Schnupp, J., Nelken, I., King, A.: Auditory neuroscience: Making sense of sound. The MIT Press (2011)
Google Scholar
Slaney, M.: An efficient implementation of the Patterson-Holdsworth auditory filter bank. Tech. rep. Apple Computer, Perception Group (1993)
Google Scholar
Zion-Golumbic, E., Schroeder, C.E.: Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron 77(5), 980–991 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Knowledge Technology Group, University of Hamburg, Vogt-Kölln-Straße 30, 22527, Hamburg, Germany
Jorge Dávila-Chacón, Johannes Twiefel & Stefan Wermter
Department of Computing, Imperial College London, Huxley Building, South Kensington Campus, London, SW7 2AZ, UK
Jindong Liu

Authors

Jorge Dávila-Chacón
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Twiefel
View author publications
You can also search for this author in PubMed Google Scholar
Jindong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Wermter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, University of Hamburg, Vogt-Kölln-Straße 30, 22527, Hamburg, Germany
Stefan Wermter , Cornelius Weber & Sven Magg , &
Department of Informatics, Nicolaus Compernicus University, ul. Grudziądzka 5, 87-100, Torun, Poland
Włodzisław Duch
Department of Modern Languages, University of Helsinki, P.O. Box 24, 00014, Helsinki, Finland
Timo Honkela
Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. G. Bonchev str. bl. 25A, 1113, Sofia, Bulgaria
Petia Koprinkova-Hristova
Institute of Neural Information Processing, University of Ulm, 89069, Oberer Eselsberg, Ulm, Germany
Günther Palm
Department of Information Systems, Quartier UNIL-Dorigny, Bâtiment Internef, University of Lausanne, 1015, Lausanne, Switzerland
Alessandro E. P. Villa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dávila-Chacón, J., Twiefel, J., Liu, J., Wermter, S. (2014). Improving Humanoid Robot Speech Recognition with Sound Source Localisation. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_78

Download citation

DOI: https://doi.org/10.1007/978-3-319-11179-7_78
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11178-0
Online ISBN: 978-3-319-11179-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics