Stateful Optimization in Federated Learning of Neural Networks

Kiss, Péter; Horváth, Tomáš; Felbab, Vukasin

doi:10.1007/978-3-030-62365-4_33

Péter Kiss¹²,
Tomáš Horváth^12,13 &
Vukasin Felbab¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12490))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1574 Accesses

Abstract

Federated learning is a emerging branch of machine learning research, that is examining the methods for training models over geographically separated, unbalanced and non-iid data. In FL, on non-convex problems, as in single node training, the almost exclusively used method is mini batch gradient descent. In this work we examine the effect of using stateful training method in a federated environment. According to our empirical results with these methods, at the cost of synchronizing state variables along with model parameters, a significant improvement can be achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Hyper-parameters are denoted following Keras documentation https://keras.io/api/optimizers/.

References

Keras reference model for cifar-10. https://keras.io/examples/cifar10_cnn/. Accessed 04 Feb 2020
Keras reference model for mnist. https://keras.io/examples/mnist_mlp/. Accessed 04 Feb 2020
Zhou, F. Cong, G.: On the convergence properties of a k-step averaging stochastic gradient descent algorithm for nonconvex optimization (2017)
Google Scholar
Yu, H., Yang, S., Zhu, S.: Parallel restarted SGD with faster convergence and less communication: demystifying why model averaging works for deep learning, vol. 33 (2019)
Google Scholar
Chen, J., Pan, X., Monga, R., Bengio, S., Jozefowicz, R.: Revisiting distributed synchronous SGD. arXiv preprint arXiv:1604.00981 (2016)
Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)
Google Scholar
Felbab, V., Kiss, P., Horváth, T.: Optimization in federated learning. In: CEUR Workshop Proceedings (CEUR-WS.org), vol. 2473, pp. 58–65. ceur-ws.org (2019). ISSN: 1613–0073
Google Scholar
Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
Hard, A., et al.: Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604 (2018)
Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks. In: Advances in Neural Information Processing Systems, pp. 1731–1741 (2017)
Google Scholar
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016)
Khaled, A., Mishchenko, K., Richtárik, P.: First analysis of local gd on heterogeneous data (2019)
Google Scholar
Konečný, J., McMahan, H.B., Ramage, D., Richtárik, P.: Federated optimization: distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016)
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_3
Chapter Google Scholar
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks (2018)
Google Scholar
Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189 (2019)
Liu, W., Chen, L., Chen, Y., Zhang, W.: Accelerating federated learning via momentum gradient descent. arXiv preprint arXiv:1910.03197 (2019)
Masters, D., Luschi, C.: Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018)
McMahan, H.B., Moore, E., Ramage, D., Hampson, S., et al.: Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629 (2016)
Stich, S.U.: Local SGD converges fast and communicates little (2018)
Google Scholar
Wang, J., Joshi, G.: Cooperative SGD: a unified framework for the design and analysis of communication-efficient SGD algorithms (2018)
Google Scholar
Wang, K., Mathews, R., Kiddon, C., Eichner, H., Beaufays, F., Ramage, D.: Federated evaluation of on-device personalization. arXiv preprint arXiv:1910.10252 (2019)
Wang, S., et al.: Adaptive federated learning in resource constrained edge computing systems. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications (2018)
Google Scholar
Woodworth, B., Wang, J., Smith, A., McMahan, B., Srebro, N.: Graph oracle models, lower bounds, and gaps for parallel stochastic optimization (2018)
Google Scholar
Yurochkin, M., Agarwal, M., Ghosh, S., Greenewald, K., Hoang, T.N., Khazaeni, Y.: Bayesian nonparametric federated learning of neural networks. arXiv preprint arXiv:1905.12022 (2019)
Zhang, S., Choromanska, A.E., LeCun, Y.: Deep learning with elastic averaging SGD. In: Advances in Neural Information Processing Systems, pp. 685–693 (2015)
Google Scholar
Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., Chandra, V.: Federated learning with non-iid data. arXiv preprint arXiv:1806.00582 (2018)

Download references

Acknowledgements

Project no. ED_18-1-2019-0030 (Application domain specific highly reliable IT solutions subprogramme) has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the Thematic Excellence Programme funding scheme.

Author information

Authors and Affiliations

Department of Data Science and Engineering, Faculty of Informatics, ELTE – Eötvös Loránd University, Budapest, Pázmány Péter Sétány 1/C., Budapest, 1117, Hungary
Péter Kiss, Tomáš Horváth & Vukasin Felbab
Faculty of Science, Institute of Computer Science, Pavol Jozef Šafárik University, Jesenná 5, 040 01, Košice, Slovakia
Tomáš Horváth

Authors

Péter Kiss
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Horváth
View author publications
You can also search for this author in PubMed Google Scholar
Vukasin Felbab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Péter Kiss .

Editor information

Editors and Affiliations

University of Minho, Braga, Portugal
Cesar Analide
University of Minho, Braga, Portugal
Paulo Novais
Technical University of Madrid, Madrid, Spain
David Camacho
University of Manchester, Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kiss, P., Horváth, T., Felbab, V. (2020). Stateful Optimization in Federated Learning of Neural Networks. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2020. IDEAL 2020. Lecture Notes in Computer Science(), vol 12490. Springer, Cham. https://doi.org/10.1007/978-3-030-62365-4_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-62365-4_33
Published: 27 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62364-7
Online ISBN: 978-3-030-62365-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics