Skip to main content

Tongue Contour Tracking in Ultrasound Images with Spatiotemporal LSTM Networks

  • Conference paper
  • First Online:
Pattern Recognition (DAGM GCPR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11824))

Included in the following conference series:

Abstract

Analysis of ultrasound images of the human tongue has many applications such as tongue modeling, speech therapy, language education and speech disorder diagnosis. In this paper we propose a novel ultrasound tongue contour tracker that enforces constraints of ultrasound imaging of the tongue such as spatial and temporal smoothness of the tongue contours. We use 3 different LSTM networks in sequence to satisfy these constraints. The first network uses only spatial image information from each video frame separately. The second and third networks add temporal information to the results of the first spatial network. Our networks are designed by considering the ultrasound image formation process of the human tongue. We use polar Brightness-Mode of the ultrasound images, which makes it possible to assume that each column of the image can contain at most one contour position. We tested our system on a dataset that we collected from 4 volunteers while they read written text. The final accuracy results are very promising and they exceed the state of the art results while keeping the run times at very reasonable levels (several frames per second). We provide the complete results of our system as supplementary material.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fasel, I., Berry, J.: Deep belief networks for real-time extraction of tongue contours from ultrasound during speech. In: 20th International Conference on Pattern Recognition, pp. 1493–1496. IEEE, August 2010

    Google Scholar 

  2. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283 (2016)

    Google Scholar 

  3. Akgul, Y.S., Kambhamettu, C., Stone, M.: Automatic extraction and tracking of the tongue contours. IEEE Trans. Med. Imaging 18(10), 1035–1045 (1999)

    Article  Google Scholar 

  4. Li, M., Kambhamettu, C., Stone, M.: Automatic contour tracking in ultrasound images. Clin. Linguist. Phonetics 19(6–7), 545–554 (2005)

    Article  Google Scholar 

  5. Stone, M.: A guide to analysing tongue motion from ultrasound images. Clin. Linguist. Phonetics 19(6–7), 455–501 (2005)

    Article  Google Scholar 

  6. Xu, K., et al.: Robust contour tracking in ultrasound tongue image sequences. Clin. Linguist. Phonetics 30(3–5), 313–327 (2016)

    Article  Google Scholar 

  7. Wen, S.: Automatic tongue contour segmentation using deep learning. Doctoral dissertation, Université d’Ottawa/University of Ottawa (2018)

    Google Scholar 

  8. Lai, K.F., Chin, R.T.: Deformable contours: modeling and extraction. IEEE Trans. Pattern Anal. Mach. Intell. 17(11), 1084–1090 (1995)

    Article  Google Scholar 

  9. Aslan, E., Dumlu, N., Akgul, Y.S.: Tongue contour extraction from ultrasound images using image parts. In: 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE, May 2018

    Google Scholar 

  10. Hinton, G.E.: A practical guide to training restricted Boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_32

    Chapter  Google Scholar 

  11. Chollet, F.: Keras: deep learning library for Theano and TensorFlow, GitHub Repos. (2015)

    Google Scholar 

  12. Gérard, J.M., Perrier, P., Payan, Y.: 3D biomechanical tongue modeling to study speech production, pp. 85–102 (2006)

    Google Scholar 

  13. Mozaffari, M.H., Wen, S., Wang, N., Lee, W.: Real-time automatic tongue contour tracking in ultrasound video for guided pronunciation training. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pp. 302–309 (2019)

    Google Scholar 

  14. Chum, O., Matas, J., Kittler, J.: Locally optimized RANSAC. In: Michaelis, B., Krell, G. (eds.) DAGM 2003. LNCS, vol. 2781, pp. 236–243. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45243-0_31

    Chapter  Google Scholar 

  15. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  16. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  17. Makin, I.R.S., Dunki-Jacobs, R., Pellegrino, R.C., Slayton, M.H.: U.S. Patent No. 7,806,892. U.S. Patent and Trademark Office, Washington, DC (2010)

    Google Scholar 

  18. Makin, I.R., Avidor, Y., Barthe, P., Slayton, M.: U.S. Patent Application No. 10/847,209 (2005)

    Google Scholar 

  19. Bridal, S.L., Correas, J.M., Saied, A.M.E.N.A., Laugier, P.: Milestones on the road to higher resolution, quantitative, and functional ultrasonic imaging. Proc. IEEE 91(10), 1543–1561 (2003)

    Article  Google Scholar 

  20. Abel, J., et al.: Ultrasound-enhanced multimodal approaches to pronunciation teaching and learning. Can. Acoust. 43(3), 124–125 (2015)

    Google Scholar 

  21. Bernhardt, M.B., et al.: Ultrasound as visual feedback in speech habilitation: exploring consultative use in rural British Columbia, Canada. Clin. Linguist. Phonetics 22(2), 149–162 (2008)

    Article  Google Scholar 

  22. Preston, J.L., McCabe, P., Rivera-Campos, A., Whittle, J.L., Landry, E., Maas, E.: Ultrasound visual feedback treatment and practice variability for residual speech sound errors. J. Speech Lang. Hear. Res. 57(6), 2102–2115 (2014)

    Article  Google Scholar 

Download references

Acknowledgement

We like to thank Dr. Naci Dumlu of Pendik State Hospital, Istanbul for providing the experiment environment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enes Aslan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 19352 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aslan, E., Akgul, Y.S. (2019). Tongue Contour Tracking in Ultrasound Images with Spatiotemporal LSTM Networks. In: Fink, G., Frintrop, S., Jiang, X. (eds) Pattern Recognition. DAGM GCPR 2019. Lecture Notes in Computer Science(), vol 11824. Springer, Cham. https://doi.org/10.1007/978-3-030-33676-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33676-9_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33675-2

  • Online ISBN: 978-3-030-33676-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics