Abstract
The multilayer perceptrons (MLPs) have strange behaviors in the learning process caused by the existing singularities in the parameter space. A detailed theoretical or numerical analysis of the MLPs is difficult due to the non-integrability of the traditional log-sigmoid activation function which leads to difficulties in obtaining the averaged learning equations (ALEs). In this paper, the error function is suggested as the activation function of the MLPs. By solving the explicit expressions of two important expectations, we obtain the averaged learning equations which make it possible for further analysis of the learning dynamics in MLPs. The simulation results also indicate that the ALEs play a significant role in investigating the singular behaviors of MLPs.
Similar content being viewed by others
References
Fukumizu K, Amari S (2000) Local minima and plateaus in hierarchical structure of multilayer perceptrons. Neural Netw 13(3):317–327
Amari S, Nagaoka H (2000) Information geometry. AMS and Oxford University Press, New York
Amari S, Ozeki T (2001) Differential and algebraic geometry of multilayer perceptrons. IEICE Trans Fundam Electron Commun Comput Sci E84-A:31–38
Amari S, Park H, Ozeki T (2006) Singularities affect dynamics of learning in neuromanifolds. Neural Comput 18(5):1007–1065
Nakajima S, Watanabe S (2007) Variational Bayes solution of linear neural networks and its generalization performance. Neural Comput 19(4):1112–1153
Watanabe S (2013) A widely applicable Bayesian information criterion. J Mach Learn Res 14:867–897
Amari S, Ozeki T, Cousseau F, Wei H (2011) Dynamics of learning in hierarchical models—singularity and milnor attractor. In: Wang R, Gu F (eds) Advances in cognitive neurodynamics (II). Proceedings of the second international conference on cognitive neurodynamics-2009. Springer, Netherlands
Wei H, Zhang J, Cousseau F, Ozeki T, Amari S (2008) Dynamics of learning near singularities in layered networks. Neural Comput 20(3):813–843
Wei H, Amari S (2008) Dynamics of learning near singularities in radial basis function networks. Neural Netw 21(7):989–1005
Rattray M, Saad D, Amari S (1998) Natural gradient descent for on-line learning. Phys Rev Lett 81(24):5461–5464
Pascanu R, Bengio Y (2013) Revisiting natural gradient for deep networks. Technical report, http://arxiv.org/abs/1301.3584
Amari S (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276
Saad D, Solla A (1995) Exact solution for online learning in multilayer neural networks. Phys Rev Lett 74(21):4337–4340
Biehl M, Schwarze H (1995) Learning by on-line gradient descent. J Phys A Math Gen 28(3):643–656
Park H, Inoue M, Okada M (2003) Online learning dynamics of multilayer perceptrons with unidentifiable parameters. J Phys A Math Gen 36(47):11753–11764
Cousseau F, Ozeki T, Amari S (2008) Dynamics of learning in multilayer perceptrons near singularities. IEEE Trans Neural Netw 19(8):1313–1328
Satoh S, Nakano R (2013) Fast and stable learning utilizing singular regions of multilayer perceptron. Neural Process Lett 38(2):99–115
Acknowledgments
This project is supported by National Natural Science Foundation of China under Grant 61374006, Major Program of National Natural Science Foundation of China under Grant 11190015 and Research Fund for the Doctoral Program of Higher Education of China under Grant 20100092110020.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
From Eq. (1), we have
then
\(P_1({\user2{s}},{\user2{v}})\) and \(P_2({\user2{s}},{\user2{v}})\) can be rewritten as
Then we have:
where
According to Sherman–Morrison formula, we have:
By using Sylvester’s determinant theorem, we have:
According to the Leibniz integral rule, the following equation holds:
From (37), we can get P 1 by integrating P 2 respective to \({\user2{v}}\), so the expression of P 1 is of the following equation:
where C is a constant.
From (28), we know that \(P_1({\varvec{0}},{\varvec{0}})={\frac{1}{4}}\), then we have \(C={\frac{\pi}{2}}\). Finally, we get
Rights and permissions
About this article
Cite this article
Guo, W., Wei, H., Zhao, J. et al. Averaged learning equations of error-function-based multilayer perceptrons. Neural Comput & Applic 25, 825–832 (2014). https://doi.org/10.1007/s00521-014-1557-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-014-1557-5