Abstract
Contrastive representation learning is the state of the art in computer vision, but requires huge mini-batch sizes, special network design, or memory banks, making it unappealing for 3D medical imaging, while in 3D medical imaging, reconstruction-based self-supervised learning reaches a new height in performance, but lacks mechanisms to learn contrastive representation; therefore, this paper proposes a new framework for self-supervised contrastive learning via reconstruction, called Parts2Whole, because it exploits the universal and intrinsic part-whole relationship to learn contrastive representation without using contrastive loss: Reconstructing an image (whole) from its own parts compels the model to learn similar latent features for all its own partsin the latent space, while reconstructing different images (wholes) from their respective parts forces the model to simultaneously push those parts belonging to different wholes farther apart from each other in the latent space; thereby the trained model is capable of distinguishing images. We have evaluated our Parts2Whole on five distinct imaging tasks covering both classification and segmentation, and compared it with four competing publicly available 3D pretrained models, showing that Parts2Whole significantly outperforms in two out of five tasks while achieves competitive performance on the rest three. This superior performance is attributable to the contrastive representations learned with Parts2Whole. Codes and pretrained models are available at github.com/JLiangLab/Parts2Whole.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
If we consider each whole image itself as a “label”, the training process of Parts2Whole is equivalent to predicting the correct “label” given a part of one image as input, or discriminating each image from its parts.
- 2.
3D U-Net: github.com/ellisdg/3DUnetCNN.
- 3.
Denote the \(l_2\)-normalized features of a positive pair and negative pair as \(\{\mathcal {F}_E(p_i), \mathcal {F}_E(p'_i)\}\) and \(\{\mathcal {F}_E(p_i), \mathcal {F}_E(p_j)\}\), respectively. The contrastive loss is calculated as \(-\log \frac{\exp (\mathcal {F}_E(p_i) \cdot \mathcal {F}_E(p'_i) / \tau )}{\exp (\mathcal {F}_E(p_i) \cdot \mathcal {F}_E(p'_i) / \tau + \sum _{j=1}^{5000} \exp (\mathcal {F}_E(p_i) \cdot \mathcal {F}_E(p_j) / \tau )}\) where \(\tau = 0.7\).
References
Ardila, D., et al.: End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25(6), 954–961 (2019)
Armato III, S.G., McLennan, G., Bidaut, L., et al.: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38(2), 915–931 (2011)
Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. In: Advances in Neural Information Processing Systems, pp. 15509–15519 (2019)
Bakas, S., Reyes, M., Jakab, A., et al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629 (2018)
Bilic, P., Christ, P.F., Vorontsov, E., Chlebus, G., Chen, H., Dou, Q., et al.: The liver tumor segmentation benchmark (LiTS). arXiv preprint arXiv:1901.04056 (2019)
Bishop, C.M., et al.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Caron, M., Misra, I., Mairal, J., et al.: Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882 (2020)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Chen, S., Ma, K., Zheng, Y.: Med3D: Transfer learning for 3D medical image analysis. arXiv preprint arXiv:1904.00625 (2019)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709 (2020)
Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 766–774 (2014)
Gibson, E., Li, W., Sudre, C., et al.: NiftyNet: a deep-learning platform for medical imaging. Comput. Methods Programs Biomed. 158, 113–122 (2018)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Misra, I., Maaten, L.V.D.: Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6707–6717 (2020)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Setio, A.A.A., Traverso, A., De Bel, T., et al.: Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Med. Image Anal. 42, 1–13 (2017)
Tajbakhsh, N., Gotway, M.B., Liang, J.: Computer-aided pulmonary embolism detection using a novel vessel-aligned multi-planar image representation and convolutional neural networks. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9350, pp. 62–69. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24571-3_8
Tao, X., Li, Y., Zhou, W., Ma, K., Zheng, Y.: Revisiting Rubik’s cube: self-supervised learning with volume-wise transformation for 3D medical image segmentation. arXiv preprint arXiv:2007.08826 (2020)
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742 (2018)
Zhou, Z., et al.: Models genesis: generic autodidactic models for 3D medical image analysis. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11767, pp. 384–393. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32251-9_42
Acknowledgments
This research has been supported partially by ASU and Mayo Clinic through a Seed Grant and an Innovation Grant, and partially by the NIH under Award Number R01HL128785. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. This work has utilized the GPUs provided partially by the ASU Research Computing and partially by the Extreme Science and Engineering Discovery Environment (XSEDE) funded by the National Science Foundation (NSF) under grant number ACI-1548562. We would like to thank Jiaxuan Pang, Md Mahfuzur Rahman Siddiquee, and Zuwei Guo for evaluating I3D, NiftyNet, and MedicalNet, respectively. The content of this paper is covered by patents pending.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Feng, R., Zhou, Z., Gotway, M.B., Liang, J. (2020). Parts2Whole: Self-supervised Contrastive Learning via Reconstruction. In: Albarqouni, S., et al. Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning. DART DCL 2020 2020. Lecture Notes in Computer Science(), vol 12444. Springer, Cham. https://doi.org/10.1007/978-3-030-60548-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-60548-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60547-6
Online ISBN: 978-3-030-60548-3
eBook Packages: Computer ScienceComputer Science (R0)