Captioning Ultrasound Images Automatically

Alsharid, Mohammad; Sharma, Harshita; Drukker, Lior; Chatelain, Pierre; Papageorghiou, Aris T.; Noble, J. Alison

doi:10.1007/978-3-030-32251-9_37

Mohammad Alsharid¹⁶,
Harshita Sharma¹⁶,
Lior Drukker¹⁷,
Pierre Chatelain¹⁶,
Aris T. Papageorghiou¹⁷ &
…
J. Alison Noble¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11767))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

9411 Accesses
12 Citations
9 Altmetric

Abstract

We describe an automatic natural language processing (NLP)-based image captioning method to describe fetal ultrasound video content by modelling the vocabulary commonly used by sonographers and sonologists. The generated captions are similar to the words spoken by a sonographer when describing the scan experience in terms of visual content and performed scanning actions. Using full-length second-trimester fetal ultrasound videos and text derived from accompanying expert voice-over audio recordings, we train deep learning models consisting of convolutional neural networks and recurrent neural networks in merged configurations to generate captions for ultrasound video frames. We evaluate different model architectures using established general metrics (BLEU, ROUGE-L) and application-specific metrics. Results show that the proposed models can learn joint representations of image and text to generate relevant and descriptive captions for anatomies, such as the spine, the abdomen, the heart, and the head, in clinical fetal ultrasound scans.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bernardi, R., et al.: Automatic description generation from images: a survey of models, datasets, and evaluation measures. In: IJCAI, pp. 4970–4974 (2017)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734. ACL (2014)
Google Scholar
Department of Engineering Science, University of Oxford: PULSE. https://www.eng.ox.ac.uk/pulse/
Elliott, D., Keller, F.: Image description using visual dependency representations. In: EMNLP, pp. 1292–1302 (2013)
Google Scholar
Goodfellow, I., et al.: Deep Learning (2016)
Google Scholar
Google Cloud: Cloud Speech-to-Text. cloud.google.com/speech-to-text/
Google Code Archive: Word2Vec (2013). code.google.com/archive/p/word2vec/
GrammarBot: Grammar Check API. https://www.grammarbot.io/
Hochreiter, S., Schmidhuber, J.: Long short-term memory. NC 9(8), 1735–1780 (1997)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2015)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. Text Summarization Branches Out (2004)
Google Scholar
Lyndon, D., et al.: Neural captioning for the Image CLEF 2017 medical image challenges. In: CEUR Workshop Proceedings, vol. 1866 (2017)
Google Scholar
McCarthy, P.M., Jarvis, S.: MTLD, vocd-D, and HD-D: a validation study of sophisticated approaches to lexical diversity assessment. Behav. Res. Methods 42(2), 381–392 (2010)
Article Google Scholar
Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Google Scholar
Ordonez, V., et al.: Im2Text: describing images using 1 million captioned photographs. In: Advances in NIPS, pp. 1143–1151 (2011)
Google Scholar
Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on ACL, pp. 311–318. ACL (2002)
Google Scholar
Pennington, et al.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Google Scholar
Sloetjes, H., Wittenburg, P.: Annotation by category-ELAN and ISO DCR. In: LREC (2008)
Google Scholar
Tanti, M., et al.: What is the role of recurrent neural networks (RNNs) in an image caption generator? ACL, pp. 51–60 (2017)
Google Scholar
Tanti, M., et al.: Where to put the image in an image caption generator. Nat. Lang. Eng. 24(3), 467–489 (2018)
Article Google Scholar
Vinyals, O., et al.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on CVPR, pp. 3156–3164 (2015)
Google Scholar
You, Q., et al.: Image captioning with semantic attention. In: Proceedings of the IEEE Conference on CVPR, pp. 4651–4659 (2016)
Google Scholar
Zeng, X.H., et al.: Understanding and generating ultrasound image description. J. Comput. Sci. Technol. 33(5), 1086–1100 (2018)
Article Google Scholar

Download references

Acknowledgement

We acknowledge the ERC (ERC-ADG-2015 694 project PULSE), the EPSRC (EP/MO13774/1), the Rhodes Trust, and the NIHR BRC funding scheme.

Author information

Authors and Affiliations

Institute of Biomedical Engineering, University of Oxford, Oxford, UK
Mohammad Alsharid, Harshita Sharma, Pierre Chatelain & J. Alison Noble
Nuffield Department of Women’s and Reproductive Health, University of Oxford, Oxford, UK
Lior Drukker & Aris T. Papageorghiou

Authors

Mohammad Alsharid
View author publications
You can also search for this author in PubMed Google Scholar
Harshita Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Lior Drukker
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Chatelain
View author publications
You can also search for this author in PubMed Google Scholar
Aris T. Papageorghiou
View author publications
You can also search for this author in PubMed Google Scholar
J. Alison Noble
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Alsharid .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1985 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alsharid, M., Sharma, H., Drukker, L., Chatelain, P., Papageorghiou, A.T., Noble, J.A. (2019). Captioning Ultrasound Images Automatically. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11767. Springer, Cham. https://doi.org/10.1007/978-3-030-32251-9_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-32251-9_37
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32250-2
Online ISBN: 978-3-030-32251-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)