Skip to main content

Abstract

Many medical image classification tasks share a common unbalanced data problem. That is images of the target classes, e.g., certain types of diseases, only appear in a very small portion of the entire dataset. Nowadays, large collections of medical images are readily available. However, it is costly and may not even be feasible for medical experts to manually comb through a huge unlabeled dataset to obtain enough representative examples of the rare classes. In this paper, we propose a new method called Unified LF&SM to recommend most similar images for each class from a large unlabeled dataset for verification by medical experts and inclusion in the seed labeled dataset. Our real data augmentation significantly reduces expensive manual labeling time. In our experiments, Unified LF&SM performed best, selecting a high percentage of relevant images in its recommendation and achieving the best classification accuracy. It is easily extendable to other medical image classification problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tajbakhsh, N., et al.: Convolutional neural networks for medical image analysis: full training or fine tuning? TMI 35(5), 1299–1312 (2016)

    Google Scholar 

  2. Chatfield, K., et al.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint (2014). arXiv:1405.3531

  3. Shin, H.C., et al.: Learning to read chest x-rays: recurrent neural cascade model for automated image annotation. In: CVPR, pp. 2497–2506 (2016)

    Google Scholar 

  4. Zhu, X.: Semi-supervised Learning Literature Survey (2005)

    Google Scholar 

  5. Lu, X., et al.: Enhancing text categorization with semantic-enriched representation and training data augmentation. JAMIA 13(5), 526–535 (2006)

    Google Scholar 

  6. Xu, Z., et al.: Augmenting strong supervision using web data for fine-grained categorization. In: ICCV, pp. 2524–2532 (2015)

    Google Scholar 

  7. Chechik, G., et al.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11, 1109–1135 (2010)

    MathSciNet  MATH  Google Scholar 

  8. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: CVPR, pp. 815–823 (2015)

    Google Scholar 

  9. Bishop, C.: Pattern Recognition and Machine Learning, pp. 144–146. Springer, New York (2007)

    Google Scholar 

  10. Zhang, C., et al.: Cable footprint history: spatio-temporal technique for instrument detection in gastrointestinal endoscopic procedures. In: IPCV, pp. 308–314 (2015)

    Google Scholar 

  11. Wang, Y., et al.: Near real-time retroflexion detection in colonoscopy. JBHI 17(1), 143–152 (2013)

    Google Scholar 

  12. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014). arXiv:1409.1556

  13. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint (2016). arXiv:1603.04467

  14. Chollet, F.: Keras. https://github.com/fchollet/keras

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuanhai Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhang, C., Tavanapong, W., Wong, J., de Groen, P.C., Oh, J. (2017). Real Data Augmentation for Medical Image Classification. In: Cardoso, M., et al. Intravascular Imaging and Computer Assisted Stenting, and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis. LABELS STENT CVII 2017 2017 2017. Lecture Notes in Computer Science(), vol 10552. Springer, Cham. https://doi.org/10.1007/978-3-319-67534-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67534-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67533-6

  • Online ISBN: 978-3-319-67534-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics