Tongue Contour Tracking in Ultrasound Images with Spatiotemporal LSTM Networks

Aslan, Enes; Akgul, Yusuf Sinan

doi:10.1007/978-3-030-33676-9_36

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11824))

Included in the following conference series:

German Conference on Pattern Recognition

1862 Accesses
1 Citations

Abstract

Analysis of ultrasound images of the human tongue has many applications such as tongue modeling, speech therapy, language education and speech disorder diagnosis. In this paper we propose a novel ultrasound tongue contour tracker that enforces constraints of ultrasound imaging of the tongue such as spatial and temporal smoothness of the tongue contours. We use 3 different LSTM networks in sequence to satisfy these constraints. The first network uses only spatial image information from each video frame separately. The second and third networks add temporal information to the results of the first spatial network. Our networks are designed by considering the ultrasound image formation process of the human tongue. We use polar Brightness-Mode of the ultrasound images, which makes it possible to assume that each column of the image can contain at most one contour position. We tested our system on a dataset that we collected from 4 volunteers while they read written text. The final accuracy results are very promising and they exceed the state of the art results while keeping the run times at very reasonable levels (several frames per second). We provide the complete results of our system as supplementary material.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fasel, I., Berry, J.: Deep belief networks for real-time extraction of tongue contours from ultrasound during speech. In: 20th International Conference on Pattern Recognition, pp. 1493–1496. IEEE, August 2010
Google Scholar
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283 (2016)
Google Scholar
Akgul, Y.S., Kambhamettu, C., Stone, M.: Automatic extraction and tracking of the tongue contours. IEEE Trans. Med. Imaging 18(10), 1035–1045 (1999)
Article Google Scholar
Li, M., Kambhamettu, C., Stone, M.: Automatic contour tracking in ultrasound images. Clin. Linguist. Phonetics 19(6–7), 545–554 (2005)
Article Google Scholar
Stone, M.: A guide to analysing tongue motion from ultrasound images. Clin. Linguist. Phonetics 19(6–7), 455–501 (2005)
Article Google Scholar
Xu, K., et al.: Robust contour tracking in ultrasound tongue image sequences. Clin. Linguist. Phonetics 30(3–5), 313–327 (2016)
Article Google Scholar
Wen, S.: Automatic tongue contour segmentation using deep learning. Doctoral dissertation, Université d’Ottawa/University of Ottawa (2018)
Google Scholar
Lai, K.F., Chin, R.T.: Deformable contours: modeling and extraction. IEEE Trans. Pattern Anal. Mach. Intell. 17(11), 1084–1090 (1995)
Article Google Scholar
Aslan, E., Dumlu, N., Akgul, Y.S.: Tongue contour extraction from ultrasound images using image parts. In: 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE, May 2018
Google Scholar
Hinton, G.E.: A practical guide to training restricted Boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_32
Chapter Google Scholar
Chollet, F.: Keras: deep learning library for Theano and TensorFlow, GitHub Repos. (2015)
Google Scholar
Gérard, J.M., Perrier, P., Payan, Y.: 3D biomechanical tongue modeling to study speech production, pp. 85–102 (2006)
Google Scholar
Mozaffari, M.H., Wen, S., Wang, N., Lee, W.: Real-time automatic tongue contour tracking in ultrasound video for guided pronunciation training. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pp. 302–309 (2019)
Google Scholar
Chum, O., Matas, J., Kittler, J.: Locally optimized RANSAC. In: Michaelis, B., Krell, G. (eds.) DAGM 2003. LNCS, vol. 2781, pp. 236–243. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45243-0_31
Chapter Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Makin, I.R.S., Dunki-Jacobs, R., Pellegrino, R.C., Slayton, M.H.: U.S. Patent No. 7,806,892. U.S. Patent and Trademark Office, Washington, DC (2010)
Google Scholar
Makin, I.R., Avidor, Y., Barthe, P., Slayton, M.: U.S. Patent Application No. 10/847,209 (2005)
Google Scholar
Bridal, S.L., Correas, J.M., Saied, A.M.E.N.A., Laugier, P.: Milestones on the road to higher resolution, quantitative, and functional ultrasonic imaging. Proc. IEEE 91(10), 1543–1561 (2003)
Article Google Scholar
Abel, J., et al.: Ultrasound-enhanced multimodal approaches to pronunciation teaching and learning. Can. Acoust. 43(3), 124–125 (2015)
Google Scholar
Bernhardt, M.B., et al.: Ultrasound as visual feedback in speech habilitation: exploring consultative use in rural British Columbia, Canada. Clin. Linguist. Phonetics 22(2), 149–162 (2008)
Article Google Scholar
Preston, J.L., McCabe, P., Rivera-Campos, A., Whittle, J.L., Landry, E., Maas, E.: Ultrasound visual feedback treatment and practice variability for residual speech sound errors. J. Speech Lang. Hear. Res. 57(6), 2102–2115 (2014)
Article Google Scholar

Download references

Acknowledgement

We like to thank Dr. Naci Dumlu of Pendik State Hospital, Istanbul for providing the experiment environment.

Author information

Authors and Affiliations

Department of Computer Engineering, GIT Vision Lab, Gebze Technical University, Kocaeli, Turkey
Enes Aslan & Yusuf Sinan Akgul
R&D Department, Kuveyt Turk Participation Bank, Kocaeli, Turkey
Enes Aslan

Authors

Enes Aslan
View author publications
You can also search for this author in PubMed Google Scholar
Yusuf Sinan Akgul
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Enes Aslan .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
University of Hamburg, Hamburg, Germany
Simone Frintrop
University of Münster, Münster, Germany
Xiaoyi Jiang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 19352 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aslan, E., Akgul, Y.S. (2019). Tongue Contour Tracking in Ultrasound Images with Spatiotemporal LSTM Networks. In: Fink, G., Frintrop, S., Jiang, X. (eds) Pattern Recognition. DAGM GCPR 2019. Lecture Notes in Computer Science(), vol 11824. Springer, Cham. https://doi.org/10.1007/978-3-030-33676-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-33676-9_36
Published: 25 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33675-2
Online ISBN: 978-3-030-33676-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics