Skip to main content

Stateful Optimization in Federated Learning of Neural Networks

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2020 (IDEAL 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12490))

  • 1574 Accesses

Abstract

Federated learning is a emerging branch of machine learning research, that is examining the methods for training models over geographically separated, unbalanced and non-iid data. In FL, on non-convex problems, as in single node training, the almost exclusively used method is mini batch gradient descent. In this work we examine the effect of using stateful training method in a federated environment. According to our empirical results with these methods, at the cost of synchronizing state variables along with model parameters, a significant improvement can be achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Hyper-parameters are denoted following Keras documentation https://keras.io/api/optimizers/.

References

  1. Keras reference model for cifar-10. https://keras.io/examples/cifar10_cnn/. Accessed 04 Feb 2020

  2. Keras reference model for mnist. https://keras.io/examples/mnist_mlp/. Accessed 04 Feb 2020

  3. Zhou, F. Cong, G.: On the convergence properties of a k-step averaging stochastic gradient descent algorithm for nonconvex optimization (2017)

    Google Scholar 

  4. Yu, H., Yang, S., Zhu, S.: Parallel restarted SGD with faster convergence and less communication: demystifying why model averaging works for deep learning, vol. 33 (2019)

    Google Scholar 

  5. Chen, J., Pan, X., Monga, R., Bengio, S., Jozefowicz, R.: Revisiting distributed synchronous SGD. arXiv preprint arXiv:1604.00981 (2016)

  6. Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)

    Google Scholar 

  7. Felbab, V., Kiss, P., Horváth, T.: Optimization in federated learning. In: CEUR Workshop Proceedings (CEUR-WS.org), vol. 2473, pp. 58–65. ceur-ws.org (2019). ISSN: 1613–0073

    Google Scholar 

  8. Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)

  9. Hard, A., et al.: Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604 (2018)

  10. Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks. In: Advances in Neural Information Processing Systems, pp. 1731–1741 (2017)

    Google Scholar 

  11. Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. arXiv preprint arXiv:1609.04836 (2016)

  12. Khaled, A., Mishchenko, K., Richtárik, P.: First analysis of local gd on heterogeneous data (2019)

    Google Scholar 

  13. Konečný, J., McMahan, H.B., Ramage, D., Richtárik, P.: Federated optimization: distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016)

  14. LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_3

    Chapter  Google Scholar 

  15. Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks (2018)

    Google Scholar 

  16. Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189 (2019)

  17. Liu, W., Chen, L., Chen, Y., Zhang, W.: Accelerating federated learning via momentum gradient descent. arXiv preprint arXiv:1910.03197 (2019)

  18. Masters, D., Luschi, C.: Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018)

  19. McMahan, H.B., Moore, E., Ramage, D., Hampson, S., et al.: Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629 (2016)

  20. Stich, S.U.: Local SGD converges fast and communicates little (2018)

    Google Scholar 

  21. Wang, J., Joshi, G.: Cooperative SGD: a unified framework for the design and analysis of communication-efficient SGD algorithms (2018)

    Google Scholar 

  22. Wang, K., Mathews, R., Kiddon, C., Eichner, H., Beaufays, F., Ramage, D.: Federated evaluation of on-device personalization. arXiv preprint arXiv:1910.10252 (2019)

  23. Wang, S., et al.: Adaptive federated learning in resource constrained edge computing systems. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications (2018)

    Google Scholar 

  24. Woodworth, B., Wang, J., Smith, A., McMahan, B., Srebro, N.: Graph oracle models, lower bounds, and gaps for parallel stochastic optimization (2018)

    Google Scholar 

  25. Yurochkin, M., Agarwal, M., Ghosh, S., Greenewald, K., Hoang, T.N., Khazaeni, Y.: Bayesian nonparametric federated learning of neural networks. arXiv preprint arXiv:1905.12022 (2019)

  26. Zhang, S., Choromanska, A.E., LeCun, Y.: Deep learning with elastic averaging SGD. In: Advances in Neural Information Processing Systems, pp. 685–693 (2015)

    Google Scholar 

  27. Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., Chandra, V.: Federated learning with non-iid data. arXiv preprint arXiv:1806.00582 (2018)

Download references

Acknowledgements

Project no. ED_18-1-2019-0030 (Application domain specific highly reliable IT solutions subprogramme) has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the Thematic Excellence Programme funding scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Péter Kiss .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kiss, P., Horváth, T., Felbab, V. (2020). Stateful Optimization in Federated Learning of Neural Networks. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2020. IDEAL 2020. Lecture Notes in Computer Science(), vol 12490. Springer, Cham. https://doi.org/10.1007/978-3-030-62365-4_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62365-4_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62364-7

  • Online ISBN: 978-3-030-62365-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics