Knowledge Distillation via Channel Correlation Structure

Li, Bo; Chen, Bin; Wang, Yunxiao; Dai, Tao; Hu, Maowei; Jiang, Yong; Xia, Shutao

doi:10.1007/978-3-030-82136-4_29

Bo Li^13,14,
Bin Chen^13,14,
Yunxiao Wang^13,14,
Tao Dai^13,14,
Maowei Hu¹⁵,
Yong Jiang^13,14 &
…
Shutao Xia^13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12815))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2425 Accesses
4 Citations

Abstract

Knowledge distillation (KD) has been one of the most popular techniques for model compression and acceleration, where a compact student model can be trained under the guidance of a large-capacity teacher model. The key of known KD methods is to explore multiple types of knowledge to direct the training of the student to mimic the teacher’s behaviour. To this end, we aims at the knowledge exploration on channel correlation structure in terms of intra-instance and inter-instance relationship among a mini-batch, that can be extracted and transferred from the teacher’s various outputs. Specifically, we propose a novel KD loss that derived from the Channel Correlation Structure (CCS) including feature-based and relation-based knowledge. With this novel KD loss, we can align the channel correlation of both feature maps between the teacher and student model by their channel correlation matrices. Extensive experimental results are performed to verify the effectiveness of our method compared with other KD methods on two benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Cov and Var denotes covariance and variance, respectively.

References

Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9163–9171 (2019)
Google Scholar
Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541 (2006)
Google Scholar
Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 742–751 (2017)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. arXiv preprint arXiv:2006.05525 (2020)
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Heo, B., Kim, J., Yun, S., Park, H., Kwak, N., Choi, J.Y.: A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1921–1930 (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Huang, Z., Wang, N.: Like what you like: knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219 (2017)
Kim, S.W., Kim, H.E.: Transferring knowledge to smaller network with class-distance loss (2017)
Google Scholar
Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T.: Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018)
Müller, R., Kornblith, S., Hinton, G.: When does label smoothing help? arXiv preprint arXiv:1906.02629 (2019)
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)
Google Scholar
Peng, B., et al.: Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5007–5016 (2019)
Google Scholar
Qiu, H., Zheng, Q., Memmi, G., Lu, J., Qiu, M., Thuraisingham, B.: Deep residual learning-based enhanced jpeg compression in the internet of things. IEEE Trans. Industr. Inf. 17(3), 2124–2133 (2020)
Google Scholar
Qiu, M., Qiu, H.: Review on image processing based adversarial example defenses in computer vision. In: 2020 IEEE 6th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS), pp. 94–99. IEEE (2020)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6655–6659. IEEE (2013)
Google Scholar
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: International Conference on Learning Representations (2019)
Google Scholar
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1365–1374 (2019)
Google Scholar
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-NET: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)
Google Scholar
Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4133–4141 (2017)
Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)

Download references

Acknowledgment

This work is supported in part by the National Natural Science Foundation of China under Grant 61771273, and the R&D Program of Shenzhen under Grant JCYJ20180508152204044.

Author information

Authors and Affiliations

Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Bo Li, Bin Chen, Yunxiao Wang, Tao Dai, Yong Jiang & Shutao Xia
PCL Research Center of Networks and Communications, Peng Cheng Laboratory, Shenzhen, China
Bo Li, Bin Chen, Yunxiao Wang, Tao Dai, Yong Jiang & Shutao Xia
Shenzhen Rejoice Sport Tech. Co., LTD., Shenzhen, China
Maowei Hu

Authors

Bo Li
View author publications
You can also search for this author in PubMed Google Scholar
Bin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yunxiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Dai
View author publications
You can also search for this author in PubMed Google Scholar
Maowei Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Shutao Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Chen .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Han Qiu
Ibaraki University, Hitachi, Japan
Cheng Zhang
University of Kentucky, Lexington, KY, USA
Zongming Fei
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu
Princeton University, Princeton, NJ, USA
Sun-Yuan Kung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, B. et al. (2021). Knowledge Distillation via Channel Correlation Structure. In: Qiu, H., Zhang, C., Fei, Z., Qiu, M., Kung, SY. (eds) Knowledge Science, Engineering and Management. KSEM 2021. Lecture Notes in Computer Science(), vol 12815. Springer, Cham. https://doi.org/10.1007/978-3-030-82136-4_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-82136-4_29
Published: 07 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82135-7
Online ISBN: 978-3-030-82136-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics