Skip to main content
Log in

Efficient MapReduce algorithms for triangle listing in billion-scale graphs

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

This paper addresses the classical triangle listing problem, which aims at enumerating all the tuples of three vertices connected with each other by edges. This problem has been intensively studied in internal and external memory, but it is still an urgent challenge in distributed environment where multiple machines across the network can be utilized to achieve good performance and scalability. As one of the de facto computing methodologies in distributed environment, MapReduce has been used in some of existing triangle listing algorithms. However, these algorithms usually need to shuffle a huge amount of intermediate data, which seriously hinders their scalability on large scale graphs. In this paper, we propose a new triangle listing algorithm in MapReduce, FTL, which utilizes a light weight data structure to substantially reduce the intermediate data transferred during the shuffle stage, and also is equipped with multiple-round techniques to ease the burden on memory and network bandwidth when dealing with graphs at billion scale. We prove that the size of the intermediate data can be well bounded near to the number of triangles in the graph. To further reduce the shuffle size and memory cost, we also propose improved algorithms based on a compact data structure, and present several optimization techniques to accelerate the computation and reduce the memory consumption. The extensive experimental results show that our algorithms outperform existing competitors by several times on both synthetic graphs and real world graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Wang, J., Cheng, J.: Truss decomposition in massive networks. Proc. VLDB Endow. 5(9), 812–823 (2012)

    Article  Google Scholar 

  2. Watts, D.J., Strogatz, S.H.: Collective dynamics of small-world networks. Nature 393(6684), 440–442 (1998)

    Article  Google Scholar 

  3. Schank, T.: Algorithmic aspects of triangle-based network analysis. PhD in Computer Science, University Karlsruhe, vol 1 (2007)

  4. Itai, A., Rodeh, M.: Finding a minimum circuit in a graph. SIAM J. Comput. 7(4), 413–423 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  5. Alon, N., Yuster, R., Zwick, U.: Finding and counting given length cycles. Algorithmica 17(3), 209–223 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  6. Batagelj, V., Mrvar, A.: A subquadratic triad census algorithm for large sparse networks with small maximum degree. Soc. Netw. 23(3), 237–243 (2001)

    Article  Google Scholar 

  7. Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Experimental and Efficient Algorithms, pp. 606–609. Springer, Berlin (2005)

  8. Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1), 458–473 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  9. Eppstein, D., Spiro, E.S.: The h-index of a graph and its application to dynamic subgraph statistics. In: Algorithms and Data Structures, pp. 278–289. Springer, Heidelberg (2009)

  10. Menegola, B.: An External Memory Algorithm for Listing Triangles. Technical report. Universidade Federal do Rio Grande do Sul (2010)

  11. Dementiev, R.: Algorithm engineering for large data sets. PhD Dissertation, Saarland University (2006)

  12. Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: Proceedings of SIGKDD, pp. 672–680. ACM (2011)

  13. Hu, X., Tao, Y., Chung, C.-W.: Massive graph triangulation. In: Proceedings of SIGMOD, pp. 325–336. ACM, New York (2013)

  14. Cohen, J.: Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)

    Article  Google Scholar 

  15. Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of WWW, pp. 607–614. ACM, New York (2011)

  16. Park, H.-M., Silvestri, F., Kang, U., Pagh, R.: MapReduce triangle enumeration with guarantees. In: Proceedings of CIKM, pp. 1739–1748. ACM (2014)

  17. Park, H.-M., Chung, C.-W.: An efficient MapReduce algorithm for counting triangles in a very large graph. In: Proceedings of CIKM, pp. 539–548. ACM (2013)

  18. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of OSDI, pp. 17–30 (2012)

  19. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–146. ACM, New York (2010)

  20. Zhang, H., Zhu, Y., Qin, L., Cheng, H., Yu, J.X.: Efficient triangle listing for billion-scale graphs. In: IEEE BigData, pp. 813–822. IEEE (2016)

  21. Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (June 2014). Accessed 8 Mar 2016

  22. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of WWW, pp. 591–600. ACM, New York (2010)

  23. http://lemurproject.org/clueweb09/index.php. Accessed 10 Mar 2016

  24. Lai, L., Qin, L., Lin, X., Chang, L.: Scalable subgraph enumeration in mapreduce. Proc. VLDB Endow. 8(10), 974–985 (2015)

    Article  Google Scholar 

  25. Cao, P.: Bloom filter introduction. http://pages.cs.wisc.edu/cao/papers/summary-cache/node8.html. Accessed 25 Mar 2016

  26. Lam, C.: Hadoop in Action. Manning Publications Co., New York (2010)

    Google Scholar 

  27. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: SDM, vol. 4, pp. 442–446. SIAM (2004)

  28. Khorasani, F., Vora, K., Gupta, R.: PaRMAT: a parallel generator for large R-MAT graphs (2015). https://github.com/farkhor/PaRMAT. Accessed 20 May 2016

  29. Khorasani, F., Gupta, R., Bhuyan, L.N.: Scalable SIMD-efficient graph processing on GPUs. In: Proceedings of PACT, Series PACT ’15, pp. 39–50 (2015)

  30. Kim, J., Han, W.S., Lee, S., Park, K., Yu, H.: OPT: a new framework for overlapped and parallel triangulation in large-scale graphs. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 637–648. ACM (2014)

  31. Park, H.-M., Myaeng, S.-H., Kang, U.: PTE: enumerating trillion triangles on distributed systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1115–1124. ACM (2016)

  32. Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 599–613 (2014)

Download references

Acknowledgements

This work was partially supported by the Grants from the National Science Foundation of China (61502349), Hubei Provincial Natural Science Foundation of China (2015CFB339), the Scientific and Technologic Development Programme of SuZhou (SYG201442), Research Grants Council of the Hong Kong (14209314 and 14221716), Chinese University of Hong Kong Direct Grant (4055048) and Australian Research Council (DE140100999 and DP160101513).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanyuan Zhu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Zhang, H., Qin, L. et al. Efficient MapReduce algorithms for triangle listing in billion-scale graphs. Distrib Parallel Databases 35, 149–176 (2017). https://doi.org/10.1007/s10619-017-7193-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-017-7193-1

Keywords

Navigation