Abstract
In this paper, we investigate Hebbian learning strategies applied to Convolutional Neural Network (CNN) training. We consider two unsupervised learning approaches, Hebbian Winner-Takes-All (HWTA), and Hebbian Principal Component Analysis (HPCA). The Hebbian learning rules are used to train the layers of a CNN in order to extract features that are then used for classification, without requiring backpropagation (backprop). Experimental comparisons are made with state-of-the-art unsupervised (but backprop-based) Variational Auto-Encoder (VAE) training. For completeness,we consider two supervised Hebbian learning variants (Supervised Hebbian Classifiers—SHC, and Contrastive Hebbian Learning—CHL), for training the final classification layer, which are compared to Stochastic Gradient Descent training. We also investigate hybrid learning methodologies, where some network layers are trained following the Hebbian approach, and others are trained by backprop. We tested our approaches on MNIST, CIFAR10, and CIFAR100 datasets. Our results suggest that Hebbian learning is generally suitable for training early feature extraction layers, or to retrain higher network layers in fewer training epochs than backprop. Moreover, our experiments show that Hebbian learning outperforms VAE training, with HPCA performing generally better than HWTA.
Similar content being viewed by others
Notes
The code to reproduce the experiments is available at: github.com/GabrieleLagani/HebbianPCA/tree/hebbpca.
References
Amato G, Carrara F, Falchi F, Gennaro C, Lagani G (2019) Hebbian learning meets deep convolutional neural networks. In: International conference on image analysis and processing. Springer, pp 324–334
Bahroun Y, Soltoggio A (2017) Online representation learning with single and multi-layer hebbian networks for image classification. In: International conference on artificial neural networks. Springer, pp 354–363
Becker S, Plumbley M (1996) Unsupervised neural network learning procedures for feature extraction and classification. Appl Intell 6(3):185–203
Diehl PU, Cook M (2015) Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Front Comput Neurosci 9:99
Ferré P, Mamalet F, Thorpe SJ (2018) Unsupervised feature learning with winner-takes-all based stdp. Front Comput Neurosci 12:24
Földiak P (1989) Adaptive network for optimal linear feature extraction. In: Proceedings of IEEE/INNS international joint conference on neural networks, vol 1, pp 401–405
Grossberg S (1976) Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors. Biol Cybern 23(3):121–134
Haykin S (2009) Neural networks and learning machines, 3rd edn. Pearson
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 558–567
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) beta-vae: Learning basic visual concepts with a constrained variational framework
Hyvarinen A, Karhunen J, Oja E (2002) Independent component analysis. Stud Inf Control 11(2):205–207
Karhunen J, Joutsensalo J (1995) Generalizations of principal component analysis, optimization problems, and neural networks. Neural Netw 8(4):549–562
Kingma DP, Welling M (2013) Auto-encoding variational bayes
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
Kolda TG, Lewis RM, Torczon V (2003) Optimization by direct search: new perspectives on some classical and modern methods. SIAM Rev 45(3):385–482
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems
Lagani G (2019) Hebbian learning algorithms for training convolutional neural networks. Master’s thesis, School of Engineering, University of Pisa, Italy. https://etd.adm.unipi.it/theses/available/etd-03292019-220853/
LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Miconi T, Clune J, Stanley KO (2018) Differentiable plasticity: training plastic neural networks with backpropagation
Movellan JR (1991) Contrastive hebbian learning in the continuous hopfield model. In: Connectionist models. Elsevier, pp 10–17
Olshausen BA (1996) Learning linear, sparse, factorial codes
Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607
O’Reilly RC (1996) Biologically plausible error-driven learning using local activation differences: the generalized recirculation algorithm. Neural Comput 8(5):895–938
O’reilly RC (2001) Generalization in interactive networks: the benefits of inhibitory competition and hebbian learning. Neural Comput 13(6):1199–1241
O’Reilly RC, Munakata Y (2000) Computational explorations in cognitive neuroscience: Understanding the mind by simulating the brain. MIT Press
Pehlevan C, Chklovskii DB (2015) Optimization theory of hebbian/anti-hebbian networks for pca and whitening. In: 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, pp 1458–1465
Pehlevan C, Hu T, Chklovskii DB (2015) A hebbian/anti-hebbian neural network for linear subspace learning: A derivation from multidimensional scaling of streaming data. Neural Comput 27(7):1461–1495
Ponulak F (2005) Resume-new supervised learning method for spiking neural networks. technical report. In: Institute of control and information engineering, Poznan University of Technology
Rozell CJ, Johnson DH, Baraniuk RG, Olshausen BA (2008) Sparse coding via thresholding and local competition in neural circuits. Neural Comput 20(10):2526–2563
Rumelhart DE, Zipser D (1985) Feature discovery by competitive learning. Cogn Sci 9(1):75–112
Sanger TD (1989) Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 2(6):459–473
Shrestha A, Ahmed K, Wang Y, Qiu Q (2017) Stable spike-timing dependent plasticity rule for multilayer unsupervised and supervised learning. In: International joint conference on neural networks (IJCNN). IEEE, pp 1999–2006
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Wadhwa A, Madhow U (2016) Bottom-up deep learning using the hebbian principle
Wadhwa A, Madhow U (2016) Learning sparse, distributed representations using the hebbian principle
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks?
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was partially supported by the H2020 project AI4EU under GA 825619 and by the H2020 project AI4Media under GA 951911.
Appendices
Appendix
Appendix 1: Supplementary results
In this Appendix, we present the additional results on MNIST, CIFAR10, and CIFAR100 datasets. Tables 7, 9, 11, show the results of hybrid training, in which part of the network layers are trained by supervised backprop training, and part with the Hebbian approach. Tables 8, 10, 12, show the results of SHC and CHL classifiers, compared with SGD classifiers, trained on the features extracted from the various layers of a pre-trained network.
1.1 MNIST
1.1.1 Hybrid network models
In Table 7, we report the results obtained on the MNIST test set with hybrid networks. In each row, we reported the results for a network with a different combination of Hebbian and backprop layers (the first row below the header represent the network fully trained with backprop). We used the letter “H” to denote layers trained using the Hebbian approach, and the letter “B” for layers trained using backprop. The letter “G” is used for the final classifier (corresponding to the sixth layer) trained with gradient descent. The final classifier (corresponding to the sixth layer) was trained with SGD in all the cases, in order to make comparisons on equal footings. The last two columns show the resulting accuracy obtained with the corresponding combination of layers.
Table 7 allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents the network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. Both HPCA and HWTA exhibit comparable results with respect to full backprop training. A result slightly higher than full backprop is observed when layer 5 is replaced, suggesting that some combinations of layers might actually be helpful to increase performance. In the successive rows, more layers are switched from backprop to Hebbian training, and a slight performance drop is observed, but the HPCA approach seems to perform generally better than HWTA when more Hebbian layers are involved. The most prominent difference appears when we finally replace all the network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of more than 2% points over HWTA.
1.1.2 Comparison of SHC and SGD
Table 8 shows a comparison between SHC and SGD classifiers placed on the various layers of a network pre-trained with backprop. The results suggest that SHC is effective in classifying high-level features, achieving comparable accuracy as SGD, but requiring fewer training epochs. On the other hand, SHC is not so effective on lower layer features, although the convergence time is still fast, suggesting that the supervised Hebbian approach benefits from the use of more abstract latent representations. CHL appears to perform comparably to SGD training.
1.2 CIFAR10
1.2.1 Hybrid network models
In Table 9, we report the results obtained on the CIFAR10 test set with hybrid networks. The table, which has the same structure as that of the previous sub-section, allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents the network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. Both HPCA, and HWTA exhibit competitive results with respect to full backprop training, when they are used to train the first or the fifth network layer. A small, but more significant drop is observed when inner layers are switched from backprop to Hebbian. In the successive rows, more layers are switched from backprop to Hebbian training, and a higher performance drop is observed, but the HPCA approach seems to perform better than HWTA when more Hebbian layers are involved. The most prominent difference appears when we finally replace all the deep network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of 15% points over HWTA.
1.2.2 Comparison of SHC and SGD
Table 10 shows a comparison between SHC, CHL, and SGD classifiers placed on the various layers of a network pre-trained with backprop. The results suggest that SHC is effective in classifying high-level features, achieving comparable accuracy as SGD, but requiring fewer training epochs. On the other hand, SHC is not so effective on lower layer features, although the convergence time is still fast, suggesting that the supervised Hebbian approach benefits from the use of more abstract latent representations. CHL appears to perform comparably to SGD training.
1.3 CIFAR100
1.3.1 Hybrid network models
In Table 11, we report the results obtained on the CIFAR100 test set with hybrid networks. The table, which has the same structure as those of the previous sub-sections, allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents our network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. HWTA exhibits competitive results with respect to full backprop when it is used to train the first or the fifth network layer. A small, but more significant drop is observed when inner layers are switched from backprop to HWTA. On the other hand, the HPCA approach seems to perform generally better than HWTA. In particular, it slightly outperforms full backprop (by 2% points), when used to train the fifth network layer, suggesting that this kind of hybrid combinations might be useful when more complex tasks are involved. In the successive rows, more layers are switched from backprop to Hebbian training, and a higher performance drop is observed, but still, the HPCA approach exhibits a better behavior than HWTA. The most prominent difference appears when we finally replace all the network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of 22% points over HWTA.
1.3.2 Comparison of SHC and SGD
Table 12 shows a comparison between SHC, CHL, and SGD classifiers placed on the various layers of a network pre-trained with backprop. In this case, SHC achieves comparable accuracy as SGD (even with a slight improvement of 6% points on layer 3), but requiring fewer training epochs, suggesting that the approach might be especially useful when more complex tasks are involved. On the other hand, in this case, lower performance is observed when CHL is used, suggesting that this approach has more difficulties in scaling to more complex datasets.
Rights and permissions
About this article
Cite this article
Lagani, G., Falchi, F., Gennaro, C. et al. Comparing the performance of Hebbian against backpropagation learning using convolutional neural networks. Neural Comput & Applic 34, 6503–6519 (2022). https://doi.org/10.1007/s00521-021-06701-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06701-4