Abstract
We consider a number of fundamental statistical and graph problems in the message-passing model, where we have \(k\) machines (sites), each holding a piece of data, and the machines want to jointly solve a problem defined on the union of the \(k\) data sets. The communication is point-to-point, and the goal is to minimize the total communication among the \(k\) machines. This model captures all point-to-point distributed computational models with respect to minimizing communication costs. Our analysis shows that exact computation of many statistical and graph problems in this distributed setting requires a prohibitively large amount of communication, and often one cannot improve upon the communication of the simple protocol in which all machines send their data to a centralized server. Thus, in order to obtain protocols that are communication-efficient, one has to allow approximation, or investigate the distribution or layout of the data sets.
Similar content being viewed by others
Notes
In the comparison we neglect the constants hidden in the big-\(O\) and big-\({\varOmega }\) notation which should be small.
We can also choose, for example, \(P_1\) to be the coordinator and avoid the need for an additional site, though having an additional site makes the notation cleaner.
We conjectured Theorem 5 in the conference version of this paper.
References
Ahn, K.J., Guha, S., McGregor, A.: Analyzing graph structure via linear measurements. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 459–467. SIAM (2012)
Ahn, K.J., Guha, S., McGregor, A.: Graph sketches: Sparsification, spanners, and subgraphs. In: Proceedings of ACM Symposium on Principles of Database Systems, pp. 5–14 (2012)
Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Proceedings of International Colloquium on Automata, Languages, and Programming (2009)
Balcan, M.-F., Blum, A., Fine, S., Mansour, Y.: Distributed learning, communication complexity and privacy. J. Mach. Learn. Res. Proc. Track, 23, 26.1–26.22 (2012)
Beame, P., Koutris, P., Suciu, D.: Communication steps for parallel query processing. In: Proceedings of ACM Symposium on Principles of Database Systems, pp. 273–284 (2013)
Braverman, M., Ellen, F., Oshman, R., Pitassi, T., Vaikuntanathan, V.: A tight bound for set disjointness in the message-passing model. In: FOCS, pp. 668–677 (2013)
Brown, P., Haas, P.J., Myllymaki, J., Pirahesh, H., Reinwald, B., Sismanis, Y.: Toward automated large-scale information integration and discovery, pp. 161–180. In: Data Management in a Connected, World (2005)
Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7(2), 21 (2011)
Daumé, III, H.D., Phillips, J.M., Saha, A., Venkatasubramanian, S.: Efficient protocols for distributed classification and optimization. In: Algorithmic Learning Theory, pp. 154–168 (2012)
Daumé, III, H.D., Phillips, J.M., Saha, A., Venkatasubramanian, S.: Protocols for learning classifiers on distributed data. J. Mach. Learn. Res. Proc. Track 22, 282–290 (2012)
Dor, D., Halperin, S., Zwick, U.: All-pairs almost shortest paths. SIAM J. Comput. 29(5), 1740–1759 (2000)
Erdös, P., Rényi, A.: On the evolution of random graphs. In: Publication of the mathematical institute of the hungarian academy of sciences, pp. 17–61 (1960)
Estan, C., Varghese, G., Fisk, M.: Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Trans. Netw. 14(5), 925–937 (Oct. 2006)
Flajolet, P., Fusy, É., Gandouet, O., Meunier, F.: Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. DMTCS Proceedings (1) (2008)
Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)
Goldreich, O., Goldwasser, S., Ron, D.: Property testing and its connection to learning and approximation. J. ACM 45(4), 653–750 (1998)
Goodrich, M.T., Sitchinava, N., Zhang, Q.: Sorting, searching, and simulation in the mapreduce framework. In: Proceedings of International Symposium on Algorithms and Computation, pp. 374–383 (2011)
Huang, Z., Radunović, B., Vojnović, M., Zhang, Q.: The communication complexity of approximate maximum matching in distributed data. Manuscript (2013). http://research.microsoft.com/apps/pubs/default.aspx?id=188946
Huang, Z., Yi, K., Zhang, Q.: Randomized algorithms for tracking distributed count, frequencies, and ranks. In: Proceedings of ACM Symposium on Principles of Database Systems, pp. 295–306 (2012)
Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: Proceedings of ACM Symposium on Principles of Database Systems, pp. 41–52 (2010)
Karloff, H.J., Suri, S., Vassilvitskii, S.: A model of computation for mapreduce. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms, pp. 938–948 (2010)
Koutris, P., Suciu, D.: Parallel evaluation of conjunctive queries. In: Proceedings of ACM Symposium on Principles of Database Systems, pp. 223–234 (2011)
Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, Cambridge (1997)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)
Palmer, C.R., Gibbons, P.B., Faloutsos, C.: Anf: a fast and scalable tool for data mining in massive graphs. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 81–90 (2002)
Phillips, J.M., Verbin, E., Zhang, Q.: Lower bounds for number-in-hand multiparty communication complexity, made easy. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms (2012)
Razborov, A.A.: On the distributional complexity of disjointness. In: Proceedings of International Colloquium on Automata, Languages, and Programming (1990)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)
Woodruff, D.P., Zhang, Q.: An optimal lower bound for distinct elements in the message passing model. In SODA, pp. 718–733 (2014)
Woodruff, D.P., Zhang, Q.: Tight bounds for distributed functional monitoring. In: Proceedings of ACM Symposium on Theory of Computing (2012)
Yao, A.C.: Probabilistic computations: Towards a unified measure of complexity. In: Proceedings of IEEE Symposium on Foundations of Computer Science (1977)
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this article appeared in Proceedings of the 27th International Symposium on Distributed Computing.
Rights and permissions
About this article
Cite this article
Woodruff, D.P., Zhang, Q. When distributed computation is communication expensive. Distrib. Comput. 30, 309–323 (2017). https://doi.org/10.1007/s00446-014-0218-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00446-014-0218-3