Dropout Rademacher complexity of deep neural networks

Gao, Wei; Zhou, Zhi-Hua

doi:10.1007/s11432-015-5470-z

Dropout Rademacher complexity of deep neural networks

Research Paper
Published: 16 June 2016

Volume 59, article number 072104, (2016)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Wei Gao^1,2 &
Zhi-Hua Zhou^1,2

579 Accesses
43 Citations
Explore all metrics

Abstract

Great successes of deep neural networks have been witnessed in various real applications. Many algorithmic and implementation techniques have been developed; however, theoretical understanding of many aspects of deep neural networks is far from clear. A particular interesting issue is the usefulness of dropout, which was motivated from the intuition of preventing complex co-adaptation of feature detectors. In this paper, we study the Rademacher complexity of different types of dropouts, and our theoretical results disclose that for shallow neural networks (with one or none hidden layer) dropout is able to reduce the Rademacher complexity in polynomial, whereas for deep neural networks it can amazingly lead to an exponential reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313: 504–507
Article MathSciNet MATH Google Scholar
Cire¸san D C, Meier U, Gambardella L M, et al. Deep, big, simple neural nets for handwritten digit recognition. Neural Comput, 2010, 22: 3207–3220
Article Google Scholar
Cire¸san D C, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012. 3642–3649
Google Scholar
Coates A, Huval B, Wang T, et al. Deep learning with COTS HPC systems. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, 2013. 1337–1345
Google Scholar
Dahl G E, Sainath T N, Hinton G E. Improving deep neural networks for lvcsr using rectified linear units and dropout. In: Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, 2013. 8609–8613
Google Scholar
Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio, Speech, Language Process, 2012, 20: 30–42
Article Google Scholar
Hinton G E, Deng L, Yu D. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag, 2012, 29: 82–97
Article Google Scholar
Bo L, Lai K, Ren X, et al. Object recognition with hierarchical kernel descriptors. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, 2011. 1729–1736
Google Scholar
Mobahi H, Collobert R, Weston J. Deep learning from temporal coherence in video. In: Proceedings of the 26th International Conference on Machine Learning, Montreal, 2009. 737–744
Google Scholar
Liu Y J, Liu L, Tong S C. Adaptive neural network tracking design for a class of uncertain nonlinear discrete-time systems with dead-zone. Sci China Inf Sci, 2014, 57: 032206
MATH Google Scholar
Andreas S W, David E R, Bernardo A H. Generalization by weight-elimination with application to forecasting. In: Advances in Neural Information Processing Systems 3. Cambridge: MIT Press, 1991. 875–882
Google Scholar
Amari S, Murata N, Muller K, et al. Asymptotic statistical theory of overtraining and cross-validation. IEEE Trans Neural Netw, 1997, 8: 985–996
Article Google Scholar
Neal R M. Bayesian learning for neural networks. In: Lecture Notes in Statistics. New York: Springer, 1996
Google Scholar
Hinton G E, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012
Google Scholar
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25. Cambridge: MIT Press, 2012. 1106–1114
Google Scholar
Wan L, Zeiler M, Zhang S, et al. Regularization of neural networks using dropconnect. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, 2013. 1058–1066
Google Scholar
Wang S I, Manning C D. Fast dropout training. In: Proceedings of the 30th International Conference on Machine Learning, Atlanta, 2013. 118–126
Google Scholar
Ba J, Frey B. Adaptive dropout for training deep neural networks. In: Advances in Neural Information Processing Systems 26. Cambridge: MIT Press, 2013. 3084–3092
Google Scholar
Srivastava N, Hinton G, Krizhevsky A. Dropout: a simple way to prevent neural networks from overfitting. J Mach Lear Res, 2014, 15: 1929–1958
MathSciNet MATH Google Scholar
Baldi P, Sadowski P J. The dropout learning algorithm. Artif Intel, 2014, 210: 78–122
Article MathSciNet MATH Google Scholar
Wager S, Wang S, Liang P. Dropout training as adaptive regularization. In: Advances in Neural Information Processing Systems 26. Cambridge: MIT Press, 2013. 351–359
Google Scholar
McAllester D. A pac-bayesian tutorial with a dropout bound. arXiv:1307.2118, 2013
Google Scholar
Karpinski M, Macintyre A. Polynomial bounds for VC dimension of sigmoidal and general pfaffian neural networks. J Comput Syst Sci, 1997, 54: 169–176
Article MathSciNet MATH Google Scholar
Anthony M, Bartlett P L. Neural Network Learning: Theoretical Foundations. Cambridge: Cambridge University Press, 2009
MATH Google Scholar
Bartlett P L, Mendelson S. Rademacher and gaussian complexities: risk bounds and structural results. J Mach Lear Res, 2002, 3: 463–482
MathSciNet MATH Google Scholar
Koltchinskii V, Panchenko D. Empirical margin distributions and bounding the generalization error of combined classifiers. Ann Stat, 2002, 30: 1–50
MathSciNet MATH Google Scholar
Cortes C, Mohri M, Rostamizadeh A. Generalization bounds for learning kernels. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, 2010. 247–254
Google Scholar
Maurer A. Bounds for linear multi-task learning. J Mach Lear Res, 2006 7: 117–139
MathSciNet MATH Google Scholar
Meir R, Zhang T. Generalization error bounds for bayesian mixture algorithms. J Mach Lear Res, 2003, 4: 839–860
MathSciNet MATH Google Scholar
Zou B, Peng Z M, Xu Z B. The learning performance of support vector machine classification based on Markov sampling. Sci China Inf Sci, 2013, 56: 032110
Article MathSciNet Google Scholar
McDiarmid C. On the method of bounded differences. In: Surveys in Combinatorics. Cambridge: Cambridge University Press, 1989. 148–188
Google Scholar
Ledoux M, Talagrand, M. Probability in Banach Spaces: Isoperimetry and Processes. Berlin: Springer, 2002
MATH Google Scholar
Nair V, Hinton G E. Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, Haifa, 2010. 807–814
Google Scholar
Kakade S M, Sridharan K, Tewari A. On the complexity of linear prediction: risk bounds, margin bounds, and regularization. In: Advances in Neural Information Processing Systems 24. Cambridge: MIT Press, 2008. 351–359
Google Scholar
Bartlett P L. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory, 1998, 44: 525–536
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Wei Gao & Zhi-Hua Zhou
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing University, Nanjing, 210023, China
Wei Gao & Zhi-Hua Zhou

Authors

Wei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi-Hua Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, W., Zhou, ZH. Dropout Rademacher complexity of deep neural networks. Sci. China Inf. Sci. 59, 072104 (2016). https://doi.org/10.1007/s11432-015-5470-z

Download citation

Received: 17 October 2015
Accepted: 08 December 2015
Published: 16 June 2016
DOI: https://doi.org/10.1007/s11432-015-5470-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dropout Rademacher complexity of deep neural networks

Abstract

Access this article

Similar content being viewed by others

Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review

On the Computational Complexity of Deep Learning Algorithms

Limitations of Shallow Networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dropout Rademacher complexity of deep neural networks

Abstract

Access this article

Similar content being viewed by others

Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review

On the Computational Complexity of Deep Learning Algorithms

Limitations of Shallow Networks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation