Abstract
Artificial neural networks are popular machine learning techniques that simulate the mechanism of learning in biological organisms. The human nervous system contains cells, which are referred to as neurons. The neurons are connected to one another with the use of axons and dendrites, and the connecting regions between axons and dendrites are referred to as synapses. These connections are illustrated in FigureĀ 1.1(a). The strengths of synaptic connections often change in response to external stimuli. This change is how learning takes place in living organisms.
āThou shalt not make a machine to counterfeit a human mind.āāFrank Herbert
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The ReLU shows asymmetric saturation.
- 2.
- 3.
Weight decay is generally used with other loss functions in single-layer models and in all multi-layer models with a large number of parameters.
- 4.
This is an overloading of the terminology used in convolutional neural networks. The meaning of the word ādepthā is inferred from the context in which it is used.
Bibliography
C. Aggarwal. Data classification: Algorithms and applications, CRC Press, 2014.
C. Aggarwal. Data mining: The textbook. Springer, 2015.
C. Aggarwal. Machine learning for text. Springer, 2018.
Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), pp.Ā 1ā127, 2009.
Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE TPAMI, 35(8), pp.Ā 1798ā1828, 2013.
Y. Bengio and O. Delalleau. On the expressive power of deep architectures. Algorithmic Learning Theory, pp.Ā 18ā36, 2011.
J. Bergstra etĀ al. Theano: A CPU and GPU math compiler in Python. Python in Science Conference, 2010.
C. M. Bishop. Pattern recognition and machine learning. Springer, 2007.
C. M. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995.
L. Breiman. Random forests. Journal Machine Learning archive, 45(1), pp. 5ā32, 2001.
A. Bryson. A gradient method for optimizing multi-stage allocation processes. Harvard University Symposium on Digital Computers and their Applications, 1961.
D. Ciresan, U. Meier, L. Gambardella, and J. Schmidhuber. Deep, big, simple neural nets for handwritten digit recognition. Neural Computation, 22(12), pp.Ā 3207ā3220, 2010.
T. Cover. Geometrical and statistical properties of systems of linear inequalities with applications to pattern recognition. IEEE Transactions on Electronic Computers, pp.Ā 326ā334, 1965.
N. de Freitas. Machine Learning, University of Oxford (Course Video), 2013.https://www.youtube.com/watch?v=w2OtwL5T1ow&list=PLE6Wd9FREdyJ5lbFl8Uu-GjecvVw66F6
N. de Freitas. Deep Learning, University of Oxford (Course Video), 2015.https://www.youtube.com/watch?v=PlhFWT7vAEw&list=PLjK8ddCbDMphIMSXn-1IjyYpHU3DaUYw
O.Ā Delalleau and Y. Bengio. Shallow vs. deep sum-product networks. NIPS Conference, pp.Ā 666ā674, 2011.
Y. Freund and R. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37(3), pp.Ā 277ā296, 1999.
K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), pp.Ā 193ā202, 1980.
S. Gallant. Perceptron-based learning algorithms. IEEE Transactions on Neural Networks, 1(2), pp. 179ā191, 1990.
A. Ghodsi. STAT 946: Topics in Probability and Statistics: Deep Learning, University of Waterloo, Fall 2015. https://www.youtube.com/watch?v=fyAZszlPphs&list=PLehuLRPyt1Hyi78UOkMP-WCGRxGcA9NVOE
X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. AISTATS, pp.Ā 249ā256, 2010.
I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT Press, 2016.
A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. Acoustics, Speech and Signal Processing (ICASSP), pp.Ā 6645ā6649, 2013.
A. Graves, G. Wayne, and I. Danihelka. Neural turing machines. arXiv:1410.5401, 2014.https://arxiv.org/abs/1410.5401
K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway and residual networks learn unrolled iterative estimation. arXiv:1612.07771, 2016.https://arxiv.org/abs/1612.07771
D. Hassabis, D. Kumaran, C. Summerfield, and M. Botvinick. Neuroscience-inspired artificial intelligence. Neuron, 95(2), pp.Ā 245ā258, 2017.
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2009.
S. Haykin. Neural networks and learning machines. Pearson, 2008.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp.Ā 770ā778, 2016.
G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313, (5766), pp.Ā 504ā507, 2006.
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8), pp.Ā 1735ā1785, 1997.
J. J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. National Academy of Sciences of the USA, 79(8), pp.Ā 2554ā2558, 1982.
K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), pp.Ā 359ā366, 1989.
D. Hubel and T. Wiesel. Receptive fields of single neurones in the catās striate cortex. The Journal of Physiology, 124(3), pp.Ā 574ā591, 1959.
H. Kandel, J. Schwartz, T. Jessell, S. Siegelbaum, and A. Hudspeth. Principles of neural science. McGraw Hill, 2012.
A. Karpathy, J. Johnson, and L. Fei-Fei. Stanford University Class CS321n: Convolutional neural networks for visual recognition, 2016.http://cs231n.github.io/
H. J. Kelley. Gradient theory of optimal flight paths. Ars Journal, 30(10), pp.Ā 947ā954, 1960.
T. Kietzmann, P. McClure, and N. Kriegeskorte. Deep Neural Networks In Computational Neuroscience. bioRxiv, 133504, 2017.https://www.biorxiv.org/content/early/2017/05/04/133504
J. Kivinen and M. Warmuth. The perceptron algorithm vs. winnow: linear vs. logarithmic mistake bounds when few input variables are relevant. Computational Learning Theory, pp.Ā 289ā296, 1995.
D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT Press, 2009.
A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS Conference, pp.Ā 1097ā1105. 2012.
H. Larochelle. Neural Networks (Course). Universite de Sherbrooke, 2013.https://www.youtube.com/watch?v=SGZ6BttHMPw&list=PL6Xpj9I5qXYEcOhn7-TqghAJ6NAPrNmUBH
H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. ICML Confererence, pp.Ā 473ā480, 2007.
Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553), pp.Ā 436ā444, 2015.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp.Ā 2278ā2324, 1998.
Y. LeCun, C. Cortes, and C. Burges. The MNIST database of handwritten digits, 1998.http://yann.lecun.com/exdb/mnist/
C. Manning and R. Socher. CS224N: Natural language processing with deep learning. Stanford University School of Engineering, 2017. https://www.youtube.com/watch?v=OQQ-W_63UgQ
W. S. McCulloch and W. H. Pitts. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), pp.Ā 115ā133, 1943.
G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4), pp.Ā 235ā312, 1990.https://wordnet.princeton.edu/
M. Minsky and S. Papert. Perceptrons. An Introduction to Computational Geometry, MIT Press, 1969.
G. Montufar. Universal approximation depth and errors of narrow belief networks with discrete units. Neural Computation, 26(7), pp.Ā 1386ā1407, 2014.
R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley. Deep learning for healthcare: review, opportunities and challenges. Briefings in Bioinformatics, pp.Ā 1ā11, 2017.
H. Poon and P. Domingos. Sum-product networks: A new deep architecture. Computer Vision Workshops (ICCV Workshops), pp.Ā 689ā690, 2011.
V. Romanuke. Parallel Computing Center (Khmelnitskiy, Ukraine) represents an ensemble of 5 convolutional neural networks which performs on MNIST at 0.21 percent error rate. Retrieved 24 November 2016.
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386, 1958.
D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-propagating errors. Nature, 323 (6088), pp.Ā 533ā536, 1986.
D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by back-propagating errors. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, pp.Ā 318ā362, 1986.
J. Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61, pp.Ā 85ā117, 2015.
H. Siegelmann and E. Sontag. On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), pp.Ā 132ā150, 1995.
S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. Pegasos: Primal estimated sub-gradient solver for SVM. Mathematical Programming, 127(1), pp.Ā 3ā30, 2011.
B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
S. Wang, C. Aggarwal, and H. Liu. Using a random forest to inspire a neural network and improving on it. SIAM Conference on Data Mining, 2017.
A. Wendemuth. Learning the unlearnable. Journal of Physics A: Math. Gen., 28, pp.Ā 5423ā5436, 1995.
P. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974.
P. Werbos. The roots of backpropagation: from ordered derivatives to neural networks and political forecasting (Vol. 1). John Wiley and Sons, 1994.
P. Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), pp.Ā 1550ā1560, 1990.
J. Weston, S. Chopra, and A. Bordes. Memory networks. ICLR, 2015.
B. Widrow and M. Hoff. Adaptive switching circuits. IRE WESCON Convention Record, 4(1), pp.Ā 96ā104, 1960.
https://science.education.nih.gov/supplements/webversions/BrainAddiction/guide/lesson2-1.html
https://www.ibm.com/us-en/marketplace/deep-learning-platform
Author information
Authors and Affiliations
Rights and permissions
Copyright information
Ā© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Aggarwal, C.C. (2018). An Introduction to Neural Networks. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-94463-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-94463-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94462-3
Online ISBN: 978-3-319-94463-0
eBook Packages: Computer ScienceComputer Science (R0)