Efficient and Fair Data Valuation for Horizontal Federated Learning

Wei, Shuyue; Tong, Yongxin; Zhou, Zimu; Song, Tianshu

doi:10.1007/978-3-030-63076-8_10

Shuyue Wei¹¹,
Yongxin Tong¹¹,
Zimu Zhou¹² &
…
Tianshu Song¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12500))

7225 Accesses
19 Citations

Abstract

Availability of big data is crucial for modern machine learning applications and services. Federated learning is an emerging paradigm to unite different data owners for machine learning on massive data sets without worrying about data privacy. Yet data owners may still be reluctant to contribute unless their data sets are fairly valuated and paid. In this work, we adapt Shapley value, a widely used data valuation metric to valuating data providers in federated learning. Prior data valuation schemes for machine learning incur high computation cost because they require training of extra models on all data set combinations. For efficient data valuation, we approximately construct all the models necessary for data valuation using the gradients in training a single model, rather than train an exponential number of models from scratch. On this basis, we devise three methods for efficient contribution index estimation. Evaluations show that our methods accurately approximate the contribution index while notably accelerating its calculation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tensorflow federated. www.tensorflow.org/federated/federated_learning
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016)
Google Scholar
Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., et al.: Towards federated learning at scale: system design. In: Proceedings of Machine Learning and Systems (2019)
Google Scholar
European Parliament, The Council of the European Union: The general data protection regulation (GDPR) (2016). https://eugdpr.org
Ghorbani, A., Zou, J.Y.: Data shapley: equitable valuation of data for machine learning. In: Proceedings of the 36th International Conference on Machine Learning, pp. 2242–2251 (2019)
Google Scholar
Wolfram Research, Inc.: Mathematica, version 11.2 (2017)
Google Scholar
Jia, R., Dao, D., Wang, B., Hubis, F.A., et al.: Efficient task-specific data valuation for nearest neighbor algorithms. Proc. VLDB Endow. 12(11), 1610–1623 (2019)
Article Google Scholar
Jia, R., Dao, D., Wang, B., Hubis, F.A., et al.: Towards efficient data valuation based on the shapley value. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1167–1176 (2019)
Google Scholar
Kairouz, P., McMahan, H.B., Avent, B., Bellet, A., et al.: Advances and open problems in federated learning. CoRR abs/1912.04977 (2019)
Google Scholar
LeCun, Y., Cortes, C., Burges, C.J.: The MNIST Database (1998). http://yann.lecun.com/exdb/mnist/
McMahan, B., Moore, E., Ramage, D., Hampson, S., et al.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp. 1273–1282 (2017)
Google Scholar
Myerson, R.B.: Game Theory. Harvard University Press, Cambridge (2013)
Book Google Scholar
Shapley, L.S.: A value for \(n\)-person games. Ann. Math. Stud. 28, 307–317 (1953)
MathSciNet MATH Google Scholar
Song, T., Tong, Y., Wei, S.: Profit allocation for federated learning. In: IEEE International Conference on Big Data, pp. 2577–2586 (2019)
Google Scholar
Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 12 (2019)
Google Scholar

Download references

Acknowledgment

We are grateful to reviewers for their constructive comments. This is partially supported by the National Key Research and Development Program of China under Grant No. 2018AAA0101100 and the National Science Foundation of China (NSFC) under Grant No. 61822201 and U1811463. Yongxin Tong is the corresponding author of this chapter.

Author information

Authors and Affiliations

SKLSDE Lab, BDBC and IRI, Beihang University, Beijing, China
Shuyue Wei, Yongxin Tong & Tianshu Song
School of Information Systems, Singapore Management University, Singapore, Singapore
Zimu Zhou

Authors

Shuyue Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yongxin Tong
View author publications
You can also search for this author in PubMed Google Scholar
Zimu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Tianshu Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongxin Tong .

Editor information

Editors and Affiliations

Hong Kong University of Science and Technology, Hong Kong, Hong Kong
Qiang Yang
WeBank, Shenzhen, China
Lixin Fan
Nanyang Technological University, Singapore, Singapore
Han Yu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wei, S., Tong, Y., Zhou, Z., Song, T. (2020). Efficient and Fair Data Valuation for Horizontal Federated Learning. In: Yang, Q., Fan, L., Yu, H. (eds) Federated Learning. Lecture Notes in Computer Science(), vol 12500. Springer, Cham. https://doi.org/10.1007/978-3-030-63076-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-63076-8_10
Published: 26 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63075-1
Online ISBN: 978-3-030-63076-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics