Comparing the performance of Hebbian against backpropagation learning using convolutional neural networks

Lagani, Gabriele; Falchi, Fabrizio; Gennaro, Claudio; Amato, Giuseppe

doi:10.1007/s00521-021-06701-4

Comparing the performance of Hebbian against backpropagation learning using convolutional neural networks

Original Article
Published: 18 January 2022

Volume 34, pages 6503–6519, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Gabriele Lagani¹,
Fabrizio Falchi²,
Claudio Gennaro² &
…
Giuseppe Amato²

988 Accesses
8 Citations
2 Altmetric
Explore all metrics

Abstract

In this paper, we investigate Hebbian learning strategies applied to Convolutional Neural Network (CNN) training. We consider two unsupervised learning approaches, Hebbian Winner-Takes-All (HWTA), and Hebbian Principal Component Analysis (HPCA). The Hebbian learning rules are used to train the layers of a CNN in order to extract features that are then used for classification, without requiring backpropagation (backprop). Experimental comparisons are made with state-of-the-art unsupervised (but backprop-based) Variational Auto-Encoder (VAE) training. For completeness,we consider two supervised Hebbian learning variants (Supervised Hebbian Classifiers—SHC, and Contrastive Hebbian Learning—CHL), for training the final classification layer, which are compared to Stochastic Gradient Descent training. We also investigate hybrid learning methodologies, where some network layers are trained following the Hebbian approach, and others are trained by backprop. We tested our approaches on MNIST, CIFAR10, and CIFAR100 datasets. Our results suggest that Hebbian learning is generally suitable for training early feature extraction layers, or to retrain higher network layers in fewer training epochs than backprop. Moreover, our experiments show that Hebbian learning outperforms VAE training, with HPCA performing generally better than HWTA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating Hebbian Learning in a Semi-supervised Setting

Analyzing the Effect of Optimization Strategies in Deep Convolutional Neural Network

Towards Automatically-Tuned Deep Neural Networks

Notes

The code to reproduce the experiments is available at: github.com/GabrieleLagani/HebbianPCA/tree/hebbpca.

References

Amato G, Carrara F, Falchi F, Gennaro C, Lagani G (2019) Hebbian learning meets deep convolutional neural networks. In: International conference on image analysis and processing. Springer, pp 324–334
Bahroun Y, Soltoggio A (2017) Online representation learning with single and multi-layer hebbian networks for image classification. In: International conference on artificial neural networks. Springer, pp 354–363
Becker S, Plumbley M (1996) Unsupervised neural network learning procedures for feature extraction and classification. Appl Intell 6(3):185–203
Article Google Scholar
Diehl PU, Cook M (2015) Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Front Comput Neurosci 9:99
Article Google Scholar
Ferré P, Mamalet F, Thorpe SJ (2018) Unsupervised feature learning with winner-takes-all based stdp. Front Comput Neurosci 12:24
Article Google Scholar
Földiak P (1989) Adaptive network for optimal linear feature extraction. In: Proceedings of IEEE/INNS international joint conference on neural networks, vol 1, pp 401–405
Grossberg S (1976) Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors. Biol Cybern 23(3):121–134
Article MathSciNet Google Scholar
Haykin S (2009) Neural networks and learning machines, 3rd edn. Pearson
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M (2019) Bag of tricks for image classification with convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 558–567
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) beta-vae: Learning basic visual concepts with a constrained variational framework
Hyvarinen A, Karhunen J, Oja E (2002) Independent component analysis. Stud Inf Control 11(2):205–207
Google Scholar
Karhunen J, Joutsensalo J (1995) Generalizations of principal component analysis, optimization problems, and neural networks. Neural Netw 8(4):549–562
Article Google Scholar
Kingma DP, Welling M (2013) Auto-encoding variational bayes
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
Article MathSciNet Google Scholar
Kolda TG, Lewis RM, Torczon V (2003) Optimization by direct search: new perspectives on some classical and modern methods. SIAM Rev 45(3):385–482
Article MathSciNet Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems
Lagani G (2019) Hebbian learning algorithms for training convolutional neural networks. Master’s thesis, School of Engineering, University of Pisa, Italy. https://etd.adm.unipi.it/theses/available/etd-03292019-220853/
LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Miconi T, Clune J, Stanley KO (2018) Differentiable plasticity: training plastic neural networks with backpropagation
Movellan JR (1991) Contrastive hebbian learning in the continuous hopfield model. In: Connectionist models. Elsevier, pp 10–17
Olshausen BA (1996) Learning linear, sparse, factorial codes
Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607
Article Google Scholar
O’Reilly RC (1996) Biologically plausible error-driven learning using local activation differences: the generalized recirculation algorithm. Neural Comput 8(5):895–938
Article Google Scholar
O’reilly RC (2001) Generalization in interactive networks: the benefits of inhibitory competition and hebbian learning. Neural Comput 13(6):1199–1241
Article Google Scholar
OÃ¢â‚¬â„¢Reilly RC, Munakata Y (2000) Computational explorations in cognitive neuroscience: Understanding the mind by simulating the brain. MIT Press
Pehlevan C, Chklovskii DB (2015) Optimization theory of hebbian/anti-hebbian networks for pca and whitening. In: 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, pp 1458–1465
Pehlevan C, Hu T, Chklovskii DB (2015) A hebbian/anti-hebbian neural network for linear subspace learning: A derivation from multidimensional scaling of streaming data. Neural Comput 27(7):1461–1495
Article MathSciNet Google Scholar
Ponulak F (2005) Resume-new supervised learning method for spiking neural networks. technical report. In: Institute of control and information engineering, Poznan University of Technology
Rozell CJ, Johnson DH, Baraniuk RG, Olshausen BA (2008) Sparse coding via thresholding and local competition in neural circuits. Neural Comput 20(10):2526–2563
Article MathSciNet Google Scholar
Rumelhart DE, Zipser D (1985) Feature discovery by competitive learning. Cogn Sci 9(1):75–112
Article Google Scholar
Sanger TD (1989) Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 2(6):459–473
Article Google Scholar
Shrestha A, Ahmed K, Wang Y, Qiu Q (2017) Stable spike-timing dependent plasticity rule for multilayer unsupervised and supervised learning. In: International joint conference on neural networks (IJCNN). IEEE, pp 1999–2006
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Article Google Scholar
Wadhwa A, Madhow U (2016) Bottom-up deep learning using the hebbian principle
Wadhwa A, Madhow U (2016) Learning sparse, distributed representations using the hebbian principle
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: International conference on machine learning, pp 478–487
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks?

Download references

Author information

Authors and Affiliations

University of Pisa, 56124, Pisa, Italy
Gabriele Lagani
ISTI-CNR Pisa, 56124, Pisa, Italy
Fabrizio Falchi, Claudio Gennaro & Giuseppe Amato

Authors

Gabriele Lagani
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Falchi
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Gennaro
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Amato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriele Lagani.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was partially supported by the H2020 project AI4EU under GA 825619 and by the H2020 project AI4Media under GA 951911.

Appendices

Appendix

Appendix 1: Supplementary results

In this Appendix, we present the additional results on MNIST, CIFAR10, and CIFAR100 datasets. Tables 7, 9, 11, show the results of hybrid training, in which part of the network layers are trained by supervised backprop training, and part with the Hebbian approach. Tables 8, 10, 12, show the results of SHC and CHL classifiers, compared with SGD classifiers, trained on the features extracted from the various layers of a pre-trained network.

1.1 MNIST

1.1.1 Hybrid network models

In Table 7, we report the results obtained on the MNIST test set with hybrid networks. In each row, we reported the results for a network with a different combination of Hebbian and backprop layers (the first row below the header represent the network fully trained with backprop). We used the letter “H” to denote layers trained using the Hebbian approach, and the letter “B” for layers trained using backprop. The letter “G” is used for the final classifier (corresponding to the sixth layer) trained with gradient descent. The final classifier (corresponding to the sixth layer) was trained with SGD in all the cases, in order to make comparisons on equal footings. The last two columns show the resulting accuracy obtained with the corresponding combination of layers.

Table 7 MNIST accuracy (top-1) and 95% confidence intervals of hybrid network models

Full size table

Table 7 allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents the network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. Both HPCA and HWTA exhibit comparable results with respect to full backprop training. A result slightly higher than full backprop is observed when layer 5 is replaced, suggesting that some combinations of layers might actually be helpful to increase performance. In the successive rows, more layers are switched from backprop to Hebbian training, and a slight performance drop is observed, but the HPCA approach seems to perform generally better than HWTA when more Hebbian layers are involved. The most prominent difference appears when we finally replace all the network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of more than 2% points over HWTA.

1.1.2 Comparison of SHC and SGD

Table 8 shows a comparison between SHC and SGD classifiers placed on the various layers of a network pre-trained with backprop. The results suggest that SHC is effective in classifying high-level features, achieving comparable accuracy as SGD, but requiring fewer training epochs. On the other hand, SHC is not so effective on lower layer features, although the convergence time is still fast, suggesting that the supervised Hebbian approach benefits from the use of more abstract latent representations. CHL appears to perform comparably to SGD training.

Table 8 MNIST accuracy (top-1), 95% confidence intervals, and convergence epochs of SHC, CHL, and SGD classifiers on top of various network layer features

Full size table

1.2 CIFAR10

1.2.1 Hybrid network models

In Table 9, we report the results obtained on the CIFAR10 test set with hybrid networks. The table, which has the same structure as that of the previous sub-section, allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents the network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. Both HPCA, and HWTA exhibit competitive results with respect to full backprop training, when they are used to train the first or the fifth network layer. A small, but more significant drop is observed when inner layers are switched from backprop to Hebbian. In the successive rows, more layers are switched from backprop to Hebbian training, and a higher performance drop is observed, but the HPCA approach seems to perform better than HWTA when more Hebbian layers are involved. The most prominent difference appears when we finally replace all the deep network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of 15% points over HWTA.

Table 9 CIFAR10 accuracy (top-1) and 95% confidence intervals of hybrid network models

Full size table

1.2.2 Comparison of SHC and SGD

Table 10 shows a comparison between SHC, CHL, and SGD classifiers placed on the various layers of a network pre-trained with backprop. The results suggest that SHC is effective in classifying high-level features, achieving comparable accuracy as SGD, but requiring fewer training epochs. On the other hand, SHC is not so effective on lower layer features, although the convergence time is still fast, suggesting that the supervised Hebbian approach benefits from the use of more abstract latent representations. CHL appears to perform comparably to SGD training.

Table 10 CIFAR10 accuracy (top-1), 95% confidence intervals, and convergence epochs of SHC, CHL, and SGD classifiers on top of various network layer features

Full size table

1.3 CIFAR100

1.3.1 Hybrid network models

In Table 11, we report the results obtained on the CIFAR100 test set with hybrid networks. The table, which has the same structure as those of the previous sub-sections, allows us to understand what is the effect of switching a specific layer (or group of layers) in a network from backprop to Hebbian training. The first row represents our network fully trained with backprop. In the next rows we can observe the results of a network in which a single layer was switched. HWTA exhibits competitive results with respect to full backprop when it is used to train the first or the fifth network layer. A small, but more significant drop is observed when inner layers are switched from backprop to HWTA. On the other hand, the HPCA approach seems to perform generally better than HWTA. In particular, it slightly outperforms full backprop (by 2% points), when used to train the fifth network layer, suggesting that this kind of hybrid combinations might be useful when more complex tasks are involved. In the successive rows, more layers are switched from backprop to Hebbian training, and a higher performance drop is observed, but still, the HPCA approach exhibits a better behavior than HWTA. The most prominent difference appears when we finally replace all the network layers with Hebbian equivalent, in which case the HPCA approach shows an increase of 22% points over HWTA.

Table 11 CIFAR100 accuracy (top-5) and 95% confidence intervals of hybrid network models

Full size table

1.3.2 Comparison of SHC and SGD

Table 12 shows a comparison between SHC, CHL, and SGD classifiers placed on the various layers of a network pre-trained with backprop. In this case, SHC achieves comparable accuracy as SGD (even with a slight improvement of 6% points on layer 3), but requiring fewer training epochs, suggesting that the approach might be especially useful when more complex tasks are involved. On the other hand, in this case, lower performance is observed when CHL is used, suggesting that this approach has more difficulties in scaling to more complex datasets.

Table 12 CIFAR100 accuracy (top-5), 95% confidence intervals, and convergence epochs of SHC, CHL, and SGD classifiers on top of various network layer features

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lagani, G., Falchi, F., Gennaro, C. et al. Comparing the performance of Hebbian against backpropagation learning using convolutional neural networks. Neural Comput & Applic 34, 6503–6519 (2022). https://doi.org/10.1007/s00521-021-06701-4

Download citation

Received: 25 May 2021
Accepted: 27 October 2021
Published: 18 January 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00521-021-06701-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing the performance of Hebbian against backpropagation learning using convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Evaluating Hebbian Learning in a Semi-supervised Setting

Analyzing the Effect of Optimization Strategies in Deep Convolutional Neural Network

Towards Automatically-Tuned Deep Neural Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix

Appendix 1: Supplementary results

1.1 MNIST

1.1.1 Hybrid network models

1.1.2 Comparison of SHC and SGD

1.2 CIFAR10

1.2.1 Hybrid network models

1.2.2 Comparison of SHC and SGD

1.3 CIFAR100

1.3.1 Hybrid network models

1.3.2 Comparison of SHC and SGD

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparing the performance of Hebbian against backpropagation learning using convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Evaluating Hebbian Learning in a Semi-supervised Setting

Analyzing the Effect of Optimization Strategies in Deep Convolutional Neural Network

Towards Automatically-Tuned Deep Neural Networks

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix

Appendix 1: Supplementary results

1.1 MNIST

1.1.1 Hybrid network models

1.1.2 Comparison of SHC and SGD

1.2 CIFAR10

1.2.1 Hybrid network models

1.2.2 Comparison of SHC and SGD

1.3 CIFAR100

1.3.1 Hybrid network models

1.3.2 Comparison of SHC and SGD

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation