Skip to main content

Imagined, Intended, and Spoken Speech Envelope Synthesis from Neuromagnetic Signals

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2021)

Abstract

Neural speech decoding retrieves speech information directly from the brain, providing promise towards better communication assistance to patients with locked-in syndrome (e.g. due to amyotrophic lateral sclerosis, ALS). Currently, speech decoding research using non-invasive neural signals is limited to discrete classifications of only a few speech units (e.g., words/syllables/phrases). Considerable work remains to achieve the ultimate goal of decoding continuous speech sounds. One stepping stone towards this goal would be to reconstruct the inner speech envelope in real-time from neural activity. Numerous studies have shown the possibility of tracking the speech envelope during speech perception but this has not been demonstrated for speech production, imagination or intention. Here, we attempted to reconstruct the intended, imagined, and spoken speech envelope by decoding the temporal information of speech directly from neural signals. Using magnetoencephalography (MEG), we collected the neuromagnetic activity from 7 subjects imagining and speaking various cued phrases and from 7 different subjects speaking yes or no randomly without any cue. We used a bidirectional long short-term memory recurrent neural network (BLSTM-RNN) for single-trial regression of the speech envelope using all brainwaves (0.3–250 Hz). For the phrase stimuli, we obtained an average correlation score of 0.41 and 0.72 for reconstructing imagined and spoken speech envelope respectively, both significantly higher than the chance level (\({<}0.1\)). For the word stimuli, the correlation score of the reconstructed speech envelope was 0.77 and 0.82, respectively for intended and spoken speech. Furthermore, to evaluate the efficacy of low frequency neural oscillations in reconstructing spoken speech envelope, we used delta (0.3–4 Hz) and delta + theta (0.3–8 Hz) brainwaves and found that the performance for word stimuli was significantly lower compared to when brainwaves with all frequencies were used but no such significant difference was observed for phrase stimuli. These findings provide a foundation for direct speech synthesis from non-invasive neural signals.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Angrick, M., et al.: Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J. Neural Eng. 16(3), 036019 (2019)

    Article  Google Scholar 

  2. Anumanchipalli, G.K., Chartier, J., Chang, E.F.: Speech synthesis from neural decoding of spoken sentences. Nature 568(7753), 493–498 (2019)

    Article  Google Scholar 

  3. Boto, E., et al.: Moving magnetoencephalography towards real-world applications with a wearable system. Nature 555(7698), 657–661 (2018)

    Article  Google Scholar 

  4. Bröhl, F., Kayser, C.: Delta/theta band EEG differentially tracks low and high frequency speech-derived envelopes. Neuroimage 233, 117958 (2021)

    Article  Google Scholar 

  5. Dash, D., Ferrari, P., Wang, J.: Decoding imagined and spoken phrases from non-invasive neural (MEG) signals. Front. Neurosci. 14, 290 (2020)

    Article  Google Scholar 

  6. Dash, D., Ferrari, P., Wang, J.: Role of brainwaves in neural speech decoding. In: 2020 28th European Signal Processing Conference (EUSIPCO), pp. 1357–1361 (2021)

    Google Scholar 

  7. Dash, D., Ferrari, P., Dutta, S., Wang, J.: NeuroVAD: real-time voice activity detection from non-invasive neuromagnetic signals. Sensors 20(8), 2248 (2020)

    Article  Google Scholar 

  8. Dash, D., Ferrari, P., Hernandez, A., Heitzman, D., Austin, S.G., Wang, J.: Neural speech decoding for amyotrophic lateral sclerosis. In: Proceedings of Interspeech 2020, pp. 2782–2786 (2020)

    Google Scholar 

  9. Dash, D., Ferrari, P., Wang, J.: Spatial and spectral fingerprint in the brain: speaker identification from single trial MEG signals. In: INTERSPEECH, pp. 1203–1207 (2019)

    Google Scholar 

  10. Dash, D., Ferrari, P., Wang, J.: Decoding speech evoked jaw motion from non-invasive neuromagnetic oscillations. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)

    Google Scholar 

  11. Dash, D., Wisler, A., Ferrari, P., Davenport, E.M., Maldjian, J., Wang, J.: MEG sensor selection for neural speech decoding. IEEE Access 8, 182320–182337 (2020)

    Article  Google Scholar 

  12. de Lange, P., et al.: Measuring the cortical tracking of speech with optically-pumped magnetometers. Neuroimage 233, 117969 (2021)

    Article  Google Scholar 

  13. Destoky, F., et al.: Comparing the potential of MEG and EEG to uncover brain tracking of speech temporal envelope. Neuroimage 184, 201–213 (2019)

    Article  Google Scholar 

  14. Ding, N., Simon, J.Z.: Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. J. Neurophysiol. 107(1), 78–89 (2012). PMID: 21975452

    Article  Google Scholar 

  15. Dinh, C., Samuelsson, J.G., Hunold, A., Hämäläinen, M.S., Khan, S.: Contextual MEG and EEG source estimates using spatiotemporal LSTM networks. Front. Neurosci. 15, 119 (2021)

    Article  Google Scholar 

  16. Donhauser, P.W., Baillet, S.: Two distinct neural timescales for predictive speech processing. Neuron 105(2), 385–393 (2020)

    Article  Google Scholar 

  17. Fu, Z., Chen, J.: Congruent audiovisual speech enhances cortical envelope tracking during auditory selective attention. In: Proceedings of Interspeech 2020, pp. 116–120 (2020)

    Google Scholar 

  18. Gehrig, J., Wibral, M., Arnold, C., Kell, C.: Setting up the speech production network: How oscillations contribute to lateralized information routing. Front. Psychol. 3, 169 (2012)

    Article  Google Scholar 

  19. Hertrich, I., Dietrich, S., Ackermann, H.: Tracking the speech signal – time-locked MEG signals during perception of ultra-fast and moderately fast speech in blind and in sighted listeners. Brain Lang. 124(1), 9–21 (2013)

    Article  Google Scholar 

  20. Kojima, K., Oganian, Y., Cai, C., Findlay, A., Chang, E., Nagarajan, S.: Low-frequency neural tracking of natural speech envelope reflects the convolution of evoked responses to acoustic edges, not oscillatory entrainment. Not Oscillatory Entrainment (2021)

    Google Scholar 

  21. Kostas, D., Pang, E.W., Rudzicz, F.: Machine learning for MEG during speech tasks. Sci. Rep. 9(1), 1–13 (2019)

    Article  Google Scholar 

  22. Krishna, G., Tran, C., Han, Y., Carnahan, M., Tewfik, A.H.: Speech synthesis using EEG. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1235–1238. IEEE (2020)

    Google Scholar 

  23. Lizarazu, M., Lallier, M., Bourguignon, M., Carreiras, M., Molinaro, N.: Impaired neural response to speech edges in dyslexia. Cortex 135, 207–218 (2021)

    Article  Google Scholar 

  24. Memarian, N., Ferrari, P., Macdonald, M.J., Cheyne, D., Luc, F., Pang, E.W.: Cortical activity during speech and non-speech oromotor tasks: a magnetoencephalography (MEG) study. Neurosci. Lett. 527(1), 34–39 (2012)

    Article  Google Scholar 

  25. Meyer, L.: The neural oscillations of speech processing and language comprehension: state of the art and emerging mechanisms. Eur. J. Neurosci. 48(7), 2609–2621 (2018)

    Article  Google Scholar 

  26. Monesi, M.J., Accou, B., Montoya-Martinez, J., Francart, T., Hamme, H.V.: An LSTM based architecture to relate speech stimulus to EEG. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 941–945 (2020)

    Google Scholar 

  27. Oganian, Y., Chang, E.F.: A speech envelope landmark for syllable encoding in human superior temporal gyrus. Sci. Adv. 5(11), eaay6279 (2019)

    Google Scholar 

  28. Plapous, C., Marro, C., Scalart, P.: Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans. Audio Speech Lang. Process. 14(6), 2098–2108 (2006)

    Article  Google Scholar 

  29. Pratt, E.J., et al.: Kernel flux: a whole-head 432-magnetometer optically-pumped magnetoencephalography (OP-MEG) system for brain activity imaging during natural human experiences. In: Optical and Quantum Sensing and Precision Metrology, vol. 11700, p. 1170032. International Society for Optics and Photonics (2021)

    Google Scholar 

  30. Sharon, R.A., Narayanan, S.S., Sur, M., Murthy, A.H.: Neural speech decoding during audition, imagination and production. IEEE Access 8, 149714–149729 (2020)

    Article  Google Scholar 

  31. Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)

    Article  Google Scholar 

  32. de Taillez, T., Kollmeier, B., Meyer, B.T.: Machine learning for decoding listeners’ attention from electroencephalography evoked by continuous speech. Eur. J. Neurosci. 51(5), 1234–1241 (2020)

    Article  Google Scholar 

  33. Tang, C., Hamilton, L., Chang, E.: Intonational speech prosody encoding in the human auditory cortex. Science 357(6353), 797–801 (2017)

    Article  Google Scholar 

  34. Towle, V.L., et al.: ECoG gamma activity during a language task: differentiating expressive and receptive speech areas. Brain 131(8), 2013–2027 (2008)

    Article  Google Scholar 

  35. Vander Ghinst, M., et al.: Cortical tracking of speech-in-noise develops from childhood to adulthood. J. Neurosci. 39(15), 2938–2950 (2019)

    Article  Google Scholar 

  36. Vanthornhout, J., Decruy, L., Wouters, J., Simon, J.Z., Francart, T.: Speech intelligibility predicted from neural entrainment of the speech envelope. J. Assoc. Res. Otolaryngol. 19(2), 181–191 (2018)

    Article  Google Scholar 

  37. Willett, F.R., Avansino, D.T., Hochberg, L.R., Henderson, J.M., Shenoy, K.V.: High-performance brain-to-text communication via handwriting. Nature 593(7858), 249–254 (2021)

    Article  Google Scholar 

  38. Zhou, D., Zhang, G., Dang, J., Wu, S., Zhang, Z.: Neural entrainment to natural speech envelope based on subject aligned EEG signals. In: Proceedings of Interspeech 2020, pp. 106–110 (2020)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the University of Texas System Brain Initiative under award 362221 and partly by the National Institutes of Health (NIH) under awards R01DC016621 and R03DC013990. We would like to thank Dr. Saleem Malik, Dr. Mark McManis, Kristin Teplansky, Dr. Alan Wisler, Saara Raja, and the volunteering participants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debadatta Dash .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dash, D., Ferrari, P., Berstis, K., Wang, J. (2021). Imagined, Intended, and Spoken Speech Envelope Synthesis from Neuromagnetic Signals. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87802-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87801-6

  • Online ISBN: 978-3-030-87802-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics