TranSQ: Transformer-Based Semantic Query for Medical Report Generation

Kong, Ming; Huang, Zhengxing; Kuang, Kun; Zhu, Qiang; Wu, Fei

doi:10.1007/978-3-031-16452-1_58

Ming Kong¹²,
Zhengxing Huang¹²,
Kun Kuang^12,13,
Qiang Zhu¹² &
…
Fei Wu^12,13,14,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13438))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

7359 Accesses
1 Citations

Abstract

Medical report generation, which aims at automatically generating coherent reports with multiple sentences for the given medical images, has received growing research interest due to its tremendous potential in facilitating clinical workflow and improving health services. Due to the highly patterned nature of medical reports, each sentence can be viewed as the description of an image observation with a specific purpose. To this end, this study proposes a novel Transformer-based Semantic Query (TranSQ) model that treats the medical report generation as a direct set prediction problem. Specifically, our model generates a set of semantic features to match plausible clinical concerns and compose the report with sentence retrieval and selection. Experimental results on two prevailing radiology report datasets, i.e., IU X-Ray and MIMIC-CXR, demonstrate that our model outperforms state-of-the-art models on the generation task in terms of both language generation effectiveness and clinical efficacy, which highlights the utility of our approach in generating medical reports with topics of clinical concern as well as sentence-level visual-semantic attention mappings. The source code is available at https://github.com/zjukongming/TranSQ.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Alfarghaly, O., Khaled, R., Elkorany, A., Helal, M., Fahmy, A.: Automated radiology report generation using conditioned transformers. Inf. Med. Unlocked 24, 100557 (2021)
Article Google Scholar
Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)
Google Scholar
Banerjee, S., Lavie, A.: Meteor: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Google Scholar
Brady, A., Laoide, R.Ó., McCarthy, P., McDermott, R.: Discrepancy and error in radiology: concepts, causes and consequences. Ulster Med. J. 81(1), 3 (2012)
Google Scholar
Carion, N., et al.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, X., et al.: Microsoft coco captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)
Chen, Z., Song, Y., Chang, T.H., Wan, X.: Generating radiology reports via memory-driven transformer. arXiv preprint arXiv:2010.16056 (2020)
Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10578–10587 (2020)
Google Scholar
Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inf. Assoc. 23(2), 304–310 (2016)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Irvin, J., et al.: Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 590–597 (2019)
Google Scholar
Jing, B., Wang, Z., Xing, E.: Show, describe and conclude: on exploiting the structure information of chest x-ray reports. arXiv preprint arXiv:2004.12274 (2020)
Jing, B., Xie, P., Xing, E.: On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 2577–2586 (2018)
Google Scholar
Johnson, A.E., et al.: Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042 (2019)
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Quart. 2(1–2), 83–97 (1955)
Article MathSciNet MATH Google Scholar
Li, C.Y., Liang, X., Hu, Z., Xing, E.P.: Knowledge-driven encode, retrieve, paraphrase for medical image report generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6666–6673 (2019)
Google Scholar
Li, Y., Liang, X., Hu, Z., Xing, E.P.: Hybrid retrieval-generation reinforced agent for medical image report generation. Adv. Neural Inf. Process. Syst. 31 (2018)
Google Scholar
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Liu, F., Ge, S., Wu, X.: Competence-based multimodal curriculum learning for medical report generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 3001–3012 (2021)
Google Scholar
Liu, F., Wu, X., Ge, S., Fan, W., Zou, Y.: Exploring and distilling posterior and prior knowledge for radiology report generation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 13753–13762 (2021)
Google Scholar
Liu, F., You, C., Wu, X., Ge, S., Sun, X., et al.: Auto-encoding knowledge graph for unsupervised medical report generation. Adv. Neural Inf. Process. Syst. 34 (2021)
Google Scholar
Loshchilov, I., Hutter, F.: Fixing weight decay regularization in Adam. arXiv preprint arXiv:1711.05101 (2017)
Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 375–383 (2017)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (2019). https://arxiv.org/abs/1908.10084
Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-critical sequence training for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7008–7024 (2017)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Google Scholar
Wei, X., Zhang, T., Li, Y., Zhang, Y., Wu, F.: Multi-modality cross attention network for image and sentence matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10941–10950 (2020)
Google Scholar
Wu, T., Huang, Q., Liu, Z., Wang, Y., Lin, D.: Distribution-balanced loss for multi-label classification in long-tailed datasets. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 162–178. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_10
Yang, X., Ye, M., You, Q., Ma, F.: Writing by memorizing: Hierarchical retrieval-based medical report generation. arXiv preprint arXiv:2106.06471 (2021)
You, D., Liu, F., Ge, S., Xie, X., Zhang, J., Wu, X.: AlignTransformer: hierarchical alignment of visual regions and disease tags for medical report generation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 72–82. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_7

Download references

Acknowledgments

This work was supported in part by Key Laboratory for Corneal Diseases Research of Zhejiang Province, Key R & D Projects of the Ministry of Science and Technology (2020YFC0832500), Project by Shanghai AI Laboratory (P22KS00111), and the Starry Night Science Fund of Zhejiang University Shanghai Institute for Advanced Study (SN-ZJU-SIAS-0010).

Author information

Authors and Affiliations

Institute of Artificial Intelligence, Zhejiang University, Hangzhou, China
Ming Kong, Zhengxing Huang, Kun Kuang, Qiang Zhu & Fei Wu
Key Laboratory for Corneal Diseases Research of Zhejiang Province, Hangzhou, China
Kun Kuang & Fei Wu
Shanghai Institute for Advanced Study of Zhejiang University, Shanghai, China
Fei Wu
Shanghai AI Laboratory, Shanghai, China
Fei Wu

Authors

Ming Kong
View author publications
You can also search for this author in PubMed Google Scholar
Zhengxing Huang
View author publications
You can also search for this author in PubMed Google Scholar
Kun Kuang
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Kong .

Editor information

Editors and Affiliations

Rochester Institute of Technology, Rochester, NY, USA
Linwei Wang
Chinese University of Hong Kong, Hong Kong, Hong Kong
Qi Dou
University of Virginia, Charlottesville, VA, USA
P. Thomas Fletcher
National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
Stefanie Speidel
Case Western Reserve University, Cleveland, OH, USA
Shuo Li

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1223 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kong, M., Huang, Z., Kuang, K., Zhu, Q., Wu, F. (2022). TranSQ: Transformer-Based Semantic Query for Medical Report Generation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13438. Springer, Cham. https://doi.org/10.1007/978-3-031-16452-1_58

Download citation

DOI: https://doi.org/10.1007/978-3-031-16452-1_58
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16451-4
Online ISBN: 978-3-031-16452-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

TranSQ: Transformer-Based Semantic Query for Medical Report Generation