Skip to main content

Label-Similarity Curriculum Learning

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12374))

Included in the following conference series:

Abstract

Curriculum learning can improve neural network training by guiding the optimization to desirable optima. We propose a novel curriculum learning approach for image classification that adapts the loss function by changing the label representation.

The idea is to use a probability distribution over classes as target label, where the class probabilities reflect the similarity to the true class. Gradually, this label representation is shifted towards the standard one-hot-encoding. That is, in the beginning minor mistakes are corrected less than large mistakes, resembling a teaching process in which broad concepts are explained first before subtle differences are taught.

The class similarity can be based on prior knowledge. For the special case of the labels being natural words, we propose a generic way to automatically compute the similarities. The natural words are embedded into Euclidean space using a standard word embedding. The probability of each class is then a function of the cosine similarity between the vector representations of the class and the true label.

The proposed label-similarity curriculum learning (LCL) approach was empirically evaluated using several popular deep learning architectures for image classification tasks applied to five datasets including ImageNet, CIFAR100, and AWA2. In all scenarios, LCL was able to improve the classification accuracy on the test data compared to standard training. Code to reproduce results is available at https://github.com/speedystream/LCL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The code to reproduce our results is available in the supplementary material.

  2. 2.

    Horovod is a method which uses large batches over multiple GPU nodes and some accuracy loss is expected for the baseline method and this is well established. For more details please see Table 1 and Table 2.c in [24].

  3. 3.

    We have tried \(\epsilon \in \{0.8, 0.9, 0.91, \ldots ,\) \(0.98, 0.99, 0.992, \ldots ,\) \(0.998, 0.999\}\) for ResNet-50 and ResNet-101 . The results showed that the search space for \(\epsilon \) can be less granular and we have limited the search space accordingly.

References

  1. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning (ICML), pp. 41–48. ACM (2009)

    Google Scholar 

  2. Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 535–541. ACM (2006)

    Google Scholar 

  3. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255. IEEE (2009)

    Google Scholar 

  5. Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. In: Conference on Robot Learning (CoRL), pp. 482–495 (2017)

    Google Scholar 

  6. Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2121–2129 (2013)

    Google Scholar 

  7. Furlanello, T., Lipton, Z.C., Tschannen, M., Itti, L., Anandkumar, A.: Born again neural networks (2018). arXiv:1805.04770 [stat.ML]

  8. García, S., Herrera, F.: An extension on statistical “comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)

    MATH  Google Scholar 

  9. Goyal, P., et al.: Accurate, large minibatch SGD: Training ImageNet in 1 hour (2017). arXiv:1706.02677v2 [cs.CV]

  10. Graves, A., Bellemare, M.G., Menick, J., Munos, R., Kavukcuoglu, K.: Automated curriculum learning for neural networks. In: International Conference on Machine Learning (ICML), pp. 1311–1320. PMLR (2017)

    Google Scholar 

  11. Guo, S., et al.: CurriculumNet: weakly supervised learning from large-scale web images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 139–154. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_9

    Chapter  Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE (2016)

    Google Scholar 

  13. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: Deep Learning and Representation Learning Workshop: NIPS 2014 (2014)

    Google Scholar 

  14. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141. IEEE (2018)

    Google Scholar 

  15. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708. IEEE (2017)

    Google Scholar 

  16. Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002)

    Google Scholar 

  17. Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)

    Google Scholar 

  18. Kumar, M.P., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 1189–1197 (2010)

    Google Scholar 

  19. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning (ICML), pp. 1188–1196 (2014)

    Google Scholar 

  20. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv:1301.3781v3 [cs.CL]

  21. Müller, R., Kornblith, S., Hinton, G.: When does label smoothing help? (2019). arXiv:1906.02629 [cs.LG]

  22. Orbes-Arteainst, M., et al.: Knowledge distillation for semi-supervised domain adaptation. In: Zhou, L., et al. (eds.) OR 2.0/MLCN -2019. LNCS, vol. 11796, pp. 68–76. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32695-1_8

    Chapter  Google Scholar 

  23. Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE (2016)

    Google Scholar 

  24. Paszke, A., et al.: Automatic differentiation in PyTorch. In: NeurIPS 2017 Workshop Autodiff (2017)

    Google Scholar 

  25. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets (2014). arXiv:1412.6550 [cs.LG]

  26. Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in TensorFlow (2018). arXiv:1802.05799v3 [cs.LG]

  27. Silla, C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1), 31–72 (2011). https://doi.org/10.1007/s10618-010-0175-9

    Article  MathSciNet  MATH  Google Scholar 

  28. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (AAAI), pp. 4278–4284 (2017)

    Google Scholar 

  29. Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  30. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 dataset. Technical report. CNS-TR-2011-001, California Institute of Technology (2011)

    Google Scholar 

  31. Weinshall, D., Cohen, G., Amir, D.: Curriculum learning by transfer learning: theory and experiments with deep networks. In: International Conference on Machine Learning (ICML), pp. 5235–5243. PMLR (2018)

    Google Scholar 

  32. Wu, C., Tygert, M., LeCun, Y.: Hierarchical loss for classification (2017). arXiv:1709.01062v1 [cs.LG]

  33. Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2251–2265 (2018)

    Article  Google Scholar 

  34. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1492–1500. IEEE (2017)

    Google Scholar 

  35. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4133–4141 (2017)

    Google Scholar 

  36. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer (2016). arXiv:1612.03928 [cs.CV]

  37. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: British Machine Vision Conference (BMVC). BMVA Press (2016)

    Google Scholar 

  38. Zhang, Y., Xiang, T., Hospedales, T.M., Lu, H.: Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4320–4328 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ürün Dogan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1672 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dogan, Ü., Deshmukh, A.A., Machura, M.B., Igel, C. (2020). Label-Similarity Curriculum Learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12374. Springer, Cham. https://doi.org/10.1007/978-3-030-58526-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58526-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58525-9

  • Online ISBN: 978-3-030-58526-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics