Skip to main content

Voice Restoration After Laryngectomy Based on Magnetic Sensing of Articulator Movement and Statistical Articulation-to-Speech Conversion

  • Conference paper
  • First Online:
Biomedical Engineering Systems and Technologies (BIOSTEC 2016)

Abstract

In this work, we present a silent speech system that is able to generate audible speech from captured movement of speech articulators. Our goal is to help laryngectomy patients, i.e. patients who have lost the ability to speak following surgical removal of the larynx most frequently due to cancer, to recover their voice. In our system, we use a magnetic sensing technique known as Permanent Magnet Articulography (PMA) to capture the movement of the lips and tongue by attaching small magnets to the articulators and monitoring the magnetic field changes with sensors close to the mouth. The captured sensor data is then transformed into a sequence of speech parameter vectors from which a time-domain speech signal is finally synthesised. The key component of our system is a parametric transformation which represents the PMA-to-speech mapping. Here, this transformation takes the form of a statistical model (a mixture of factor analysers, more specifically) whose parameters are learned from simultaneous recordings of PMA and speech signals acquired before laryngectomy. To evaluate the performance of our system on voice reconstruction, we recorded two PMA-and-speech databases with different phonetic complexity for several non-impaired subjects. Results show that our system is able to synthesise speech that sounds as the original voice of the subject and also is intelligible. However, more work still need to be done to achieve a consistent synthesis for phonetically-rich vocabularies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Several speech samples are available in the Demos section of http://www.hull.ac.uk/speech/disarm.

References

  1. Atal, B.S., Chang, J.J., Mathews, M.V., Tukey, J.W.: Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. J. Acoust. Soc. Am. 63(5), 1535–1555 (1978)

    Article  Google Scholar 

  2. Braz, D.S.A., Ribas, M.M., Dedivitis, R.A., Nishimoto, I.N., Barros, A.P.B.: Quality of life and depression in patients undergoing total and partial laryngectomy. Clinics 60(2), 135–142 (2005)

    Article  Google Scholar 

  3. Byrne, A., Walsh, M., Farrelly, M., O’Driscoll, K.: Depression following laryngectomy. A pilot study. Brit. J. Psychiat. 163(2), 173–176 (1993)

    Article  Google Scholar 

  4. Cheah, L.A., Bai, J., Gonzalez, J.A., Gilbert, J.M., Ell, S.R., Green, P.D., Moore, R.K.: Preliminary evaluation of a silent speech interface based on intra-oral magnetic sensing. In: Proceedings BioDevices, pp. 108–116 (2016)

    Google Scholar 

  5. Cheah, L.A., Bai, J., Gonzalez, J.A., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: A user-centric design of permanent magnetic articulography based assistive speech technology. In: Proceedings BioSignals, pp. 109–116 (2015)

    Google Scholar 

  6. Chen, J., Kim, M., Wang, Y., Ji, Q.: Switching Gaussian process dynamic models for simultaneous composite motion tracking and recognition. In: Proceedings IEEE Conference Computer Vision and Pattern Recognition, pp. 2655–2662 (2009)

    Google Scholar 

  7. Danker, H., Wollbrück, D., Singer, S., Fuchs, M., Brähler, E., Meyer, A.: Social withdrawal after laryngectomy. Eur. Arch. Oto-Rhino-L 267(4), 593–600 (2010)

    Article  Google Scholar 

  8. De Jong, S.: SIMPLS: an alternative approach to partial least squares regression. Chemom. Intell. Lab. Syst. 18(3), 251–263 (1993)

    Article  Google Scholar 

  9. Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J., Brumberg, J.: Silent speech interfaces. Speech Commun. 52(4), 270–287 (2010)

    Article  Google Scholar 

  10. Desai, S., Raghavendra, E.V., Yegnanarayana, B., Black, A.W., Prahallad, K.: Voice conversion using artificial neural networks. In: Proceedings ICASSP, pp. 3893–3896 (2009)

    Google Scholar 

  11. Ell, S.R.: Candida: the cancer of silastic. J. Laryngol. Otol. 110(03), 240–242 (1996)

    Article  Google Scholar 

  12. Ell, S.R., Mitchell, A.J., Parker, A.J.: Microbial colonization of the groningen speaking valve and its relationship to valve failure. Clin. Otolaryngol. Allied Sci. 20(6), 555–556 (1995)

    Article  Google Scholar 

  13. Fagan, M.J., Ell, S.R., Gilbert, J.M., Sarrazin, E., Chapman, P.M.: Development of a (silent) speech recognition system for patients following laryngectomy. Med. Eng. Phys. 30(4), 419–425 (2008)

    Article  Google Scholar 

  14. Freitas, J., Teixeira, A., Bastos, C., Dias, M.: Towards a multimodal silent speech interface for European Portuguese. In: Speech Technologies, vol. 10, pp. 125–150. InTech (2011)

    Google Scholar 

  15. Fried-Oken, M., Fox, L., Rau, M.T., Tullman, J., Baker, G., Hindal, M., Wile, N., Lou, J.S.: Purposes of AAC device use for persons with ALS as reported by caregivers. Augment Altern. Commun. 22(3), 209–221 (2006)

    Article  Google Scholar 

  16. Fukada, T., Tokuda, K., Kobayashi, T., Imai, S.: An adaptive algorithm for Mel-cepstral analysis of speech. In: Proceedings ICASSP, pp. 137–140 (1992)

    Google Scholar 

  17. Ghahramani, Z., Hinton, G.E.: The EM algorithm for mixtures of factor analyzers. Technical report CRG-TR-96-1, University of Toronto (1996)

    Google Scholar 

  18. Gilbert, J.M., Rybchenko, S.I., Hofe, R., Ell, S.R., Fagan, M.J., Moore, R.K., Green, P.: Isolated word recognition of silent speech using magnetic implants and sensors. Med. Eng. Phys. 32(10), 1189–1197 (2010)

    Article  Google Scholar 

  19. Gonzalez, J.A., Green, P.D., Moore, R.K., Cheah, L.A., Gilbert, J.M.: A non-parametric articulatory-to-acoustic conversion system for silent speech using shared Gaussian process dynamical models. In: UK Speech, p. 11 (2015)

    Google Scholar 

  20. Gonzalez, J.A., Cheah, L.A., Bai, J., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography. In: Proceedings Interspeech, pp. 1018–1022 (2014)

    Google Scholar 

  21. Gonzalez, J.A., Cheah, L.A., Gilbert, J.M., Bai, J., Ell, S.R., Green, P.D., Moore, R.K.: A silent speech system based on permanent magnet articulography and direct synthesis. Comput. Speech Lang. 39, 67–87 (2016)

    Article  Google Scholar 

  22. Heaton, J.M., Parker, A.J.: Indwelling tracheo-oesophageal voice prostheses post-laryngectomy in Sheffield, UK: a 6-year review. Acta Otolaryngol. 114(6), 675–678 (1994)

    Article  Google Scholar 

  23. Herff, C., Heger, D., de Pesters, A., Telaar, D., Brunner, P., Schalk, G., Schultz, T.: Brain-to-text: decoding spoken phrases from phone representations in the brain. Front. Neurosci. 9, 217 (2015)

    Article  Google Scholar 

  24. Hofe, R., Bai, J., Cheah, L.A., Ell, S.R., Gilbert, J.M., Moore, R.K., Green, P.D.: Performance of the MVOCA silent speech interface across multiple speakers. In: Proceedings Interspeech, pp. 1140–1143 (2013)

    Google Scholar 

  25. Hofe, R., Ell, S.R., Fagan, M.J., Gilbert, J.M., Green, P.D., Moore, R.K., Rybchenko, S.I.: Speech synthesis parameter generation for the assistive silent speech interface MVOCA. In: Proceedings Interspeech, pp. 3009–3012 (2011)

    Google Scholar 

  26. Hofe, R., Ell, S.R., Fagan, M.J., Gilbert, J.M., Green, P.D., Moore, R.K., Rybchenko, S.I.: Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Commun. 55(1), 22–32 (2013)

    Article  Google Scholar 

  27. Hueber, T., Bailly, G.: Statistical conversion of silent articulation into audible speech using full-covariance HMM. Med. Eng. Phys. 36, 274–293 (2016)

    Google Scholar 

  28. Hueber, T., Bailly, G., Denby, B.: Continuous articulatory-to-acoustic mapping using phone-based trajectory HMM for a silent speech interface. In: Proceedings Interspeech, pp. 723–726 (2012)

    Google Scholar 

  29. Hueber, T., Benaroya, E.L., Chollet, G., Denby, B., Dreyfus, G., Stone, M.: Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Commun. 52(4), 288–300 (2010)

    Article  Google Scholar 

  30. International Phonetic Association: The international phonetic alphabet (2005)

    Google Scholar 

  31. Jou, S.C., Schultz, T., Walliczek, M., Kraft, F., Waibel, A.: Towards continuous speech recognition using surface electromyography. In: Proceedings Interspeech, pp. 573–576 (2006)

    Google Scholar 

  32. Kominek, J., Black, A.W.: The CMU Arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis, pp. 223–224 (2004)

    Google Scholar 

  33. Kubichek, R.: Mel-cepstral distance measure for objective speech quality assessment. In: Proceedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 125–128 (1993)

    Google Scholar 

  34. Leonard, R.: A database for speaker-independent digit recognition. In: Proceedings of ICASSP, pp. 328–331 (1984)

    Google Scholar 

  35. Maeda, S.: A digital simulation method of the vocal-tract system. Speech Commun. 1(3), 199–229 (1982)

    Article  Google Scholar 

  36. Mullen, J., Howard, D.M., Murphy, D.T.: Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality. IEEE Trans. Audio Speech Lang. Process. 14(3), 964–971 (2006)

    Article  Google Scholar 

  37. Murphy, D.T., Jani, M., Ternström, S.: Articulatory vocal tract syntheis in supercollider. In: Proceedings of International Conference on Digital Audio Effects, pp. 1–7 (2015)

    Google Scholar 

  38. Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012)

    Article  Google Scholar 

  39. Neiberg, D., Ananthakrishnan, G., Engwall, O.: The acoustic to articulation mapping: non-linear or non-unique? In: Proceedings Interspeech, pp. 1485–1488 (2008)

    Google Scholar 

  40. Petajan, E.D.: Automatic lipreading to enhance speech recognition (speech reading). Ph.D. thesis, University of Illinois at Urbana-Champaign (1984)

    Google Scholar 

  41. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Sig. Process. 26(1), 43–49 (1978)

    Article  MATH  Google Scholar 

  42. Schultz, T., Wand, M.: Modeling coarticulation in EMG-based continuous speech recognition. Speech Commun. 52(4), 341–353 (2010)

    Article  Google Scholar 

  43. Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007)

    Article  Google Scholar 

  44. Toda, T., Black, A.W., Tokuda, K.: Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model. Speech Commun. 50(3), 215–227 (2008)

    Article  Google Scholar 

  45. Toutios, A., Maeda, S.: Articulatory VCV synthesis from EMA data. In: Proceedings Interspeech (2012)

    Google Scholar 

  46. Toutios, A., Margaritis, K.G.: A support vector approach to the acoustic-to-articulatory mapping. In: Proceedings Interspeech, pp. 3221–3224 (2005)

    Google Scholar 

  47. Toutios, A., Narayanan, S.: Articulatory synthesis of French connected speech from EMA data. In: Proceedings Interspeech, pp. 2738–2742 (2013)

    Google Scholar 

  48. Uria, B., Renals, S., Richmond, K.: A deep neural network for acoustic-articulatory speech inversion. In: Proceedings of NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning (2011)

    Google Scholar 

  49. Wand, M., Janke, M., Schultz, T.: Tackling speaking mode varieties in EMG-based speech recognition. IEEE Trans. Bio-Med. Eng. 61(10), 2515–2526 (2014)

    Article  Google Scholar 

  50. Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 283–298 (2008)

    Article  Google Scholar 

  51. Zahner, M., Janke, M., Wand, M., Schultz, T.: Conversion from facial myoelectric signals to speech: a unit selection approach. In: Proceedings Interspeech, pp. 1184–1188 (2014)

    Google Scholar 

  52. Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

This is a summary of independent research funded by the National Institute for Health Research (NIHR)’s Invention for Innovation Programme. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose A. Gonzalez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Gonzalez, J.A. et al. (2017). Voice Restoration After Laryngectomy Based on Magnetic Sensing of Articulator Movement and Statistical Articulation-to-Speech Conversion. In: Fred, A., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2016. Communications in Computer and Information Science, vol 690. Springer, Cham. https://doi.org/10.1007/978-3-319-54717-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54717-6_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54716-9

  • Online ISBN: 978-3-319-54717-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics