Skip to main content
Log in

Towards k-vertex connected component discovery from large networks

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

In many real life network-based applications such as social relation analysis, Web analysis, collaborative network, road network and bioinformatics, the discovery of components with high connectivity is an important problem. In particular, k-edge connected component (k-ECC) has recently been extensively studied to discover disjoint components. Yet many real scenarios present more needs and challenges for overlapping components. In this paper, we propose a k-vertex connected component (k-VCC) model, which is much more cohesive, and thus supports overlapping between components very well. To discover k-VCCs, we propose three frameworks including top-down, bottom-up and hybrid frameworks. The top-down framework is first developed to find the exact k-VCCs by dividing the whole network. To further reduce the high computational cost for input networks of large sizes, a bottom-up framework is then proposed to locally identify the seed subgraphs, and obtain the heuristic k-VCCs by expanding and merging these seed subgraphs. Finally, the hybrid framework takes advantages of the above two frameworks. It exploits the results of bottom-up framework to construct the well-designed mixed graph and then discover the exact k-VCCs by contracting the mixed graph in a top-down way. Because the size of mixed graph is smaller than the original network, the hybrid framework runs much faster than the top-down framework. Comprehensive experimental are conducted on large real and synthetic networks and demonstrate the efficiency and effectiveness of the proposed exact and heuristic approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

Notes

  1. http://thebiogrid.org

  2. http://snap.standford.edu

References

  1. Adamcsek, B., Palla, G., Farkas, I., Derényi, I., Vicsek, T.: Cfinder: Locating cliques and overlapping modules in biological networks. Bioinformatics 22(8), 1021–1023 (2006)

    Article  Google Scholar 

  2. Akiba, T., Iwata, Y., Yoshida, Y.: Linear-time enumeration of maximal k-edge-connected subgraphs in large networks by random contraction. In: CIKM, pp. 909–918 (2013)

  3. Batagelj, V., Zaversnik, M.: An o (m) algorithm for cores decomposition of networks. arXiv:cs/0310049 (2003)

  4. Berlowitz, D., Cohen, S., Kimelfeld, B.: Efficient enumeration of maximal k-plexes. In: SIGMOD, pp. 431–444 (2015)

  5. Böger, C.A., Chen, M.H., Tin, A., Olden, M., Köttgen, A., de Boer, I.H., Fuchsberger, C., O’Seaghdha, C.M., Pattaro, C., Teumer, A., et al: Cubn is a gene locus for albuminuria. J. Am. Soc. Nephrol. 22(3), 555–570 (2011)

    Article  Google Scholar 

  6. Chang, L., Yu, J.X., Qin, L., Lin, X., Liu, C., Liang, W.: Efficiently computing k-edge connected components via graph decomposition. In: SIGMOD, pp. 205–216 (2013)

  7. Chang, L., Lin, X., Qin, L., Yu, J.X., Zhang, W.: Index-based optimal algorithms for computing steiner components with maximum connectivity. In: SIGMOD, pp. 459–474. ACM (2015)

  8. Chen, L.Y., Zhao, W.H., Tian, W., Guo, J., Jiang, F., Jin, L.J., Sun, Y.X., Chen, K.M., An, L.L., Li, G., et al: Stk39 is an independent risk factor for male hypertension in Han Chinese. Int. J. Cardiol. 154(2), 122–127 (2012)

    Article  Google Scholar 

  9. Cheng, J., Ke, Y., Chu, S., Özsu, M. T.: Efficient core decomposition in massive networks. In: ICDE, pp. 51–62 (2011)

  10. Christophides, V., Karvounarakis, G., Plexousakis, D., Scholl, M., Tourtounis, S.: Optimizing taxonomic semantic Web queries using labeling schemes. Web Semantics: Science Services and Agents on the World Wide Web 1(2), 207–228 (2004)

    Article  Google Scholar 

  11. Cohen, J.: Trusses: Cohesive subgraphs for social network analysis. National Security Agency Technical Report 16 (2008)

  12. Consortium, W.T.C.C., et al.: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145), 661 (2007)

    Article  Google Scholar 

  13. Conte, A., Firmani, D., Mordente, C., Patrignani, M., Torlone, R.: Fast enumeration of large k-plexes. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 115–124. ACM (2017)

  14. Cui, W., Xiao, Y., Wang, H., Wang, W.: Local search of communities in large graphs. In: SIGMOD, pp. 991–1002 (2014)

  15. Diestel, R.: Graph theory. Grad Texts in Math (2005)

  16. Esfahanian, A.H., Louis Hakimi, S.: On computing the connectivities of graphs and digraphs. Networks 14(2), 355–366 (1984)

    Article  MathSciNet  Google Scholar 

  17. Even, S., Tarjan, R.E.: Network flow and testing graph connectivity. SIAM J. Comput. 4(4), 507–518 (1975)

    Article  MathSciNet  Google Scholar 

  18. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)

    Article  MathSciNet  Google Scholar 

  19. Gregory, S.: Finding overlapping communities in networks by label propagation. J. Phys. 12(10), 103018 (2010)

    Google Scholar 

  20. Hariharan, R., Kavitha, T., Panigrahi, D., Bhalgat, A.: An o (mn) gomory-hu tree construction algorithm for unweighted graphs. In: ACM Symposium on Theory of Computing, pp. 605–614 (2007)

  21. Hu, J., Wu, X., Cheng, R., Luo, S., Fang, Y.: Querying minimal steiner maximum-connected subgraphs in large graphs. In: CIKM, pp. 1241–1250. ACM (2016)

  22. Huang, X., Cheng, H., Qin, L., Tian, W., Yu, J.X.: Querying k-truss community in large and dynamic graphs. In: SIGMOD, pp. 1311–1322 (2014)

  23. Huang, X., Lu, W., Lakshmanan, L.V.: Truss decomposition of probabilistic graphs: Semantics and algorithms. In: ACM Proc. of SIGMOD, pp. 77–90. ACM (2016)

  24. Kane, V., Mohanty, S.: A lower bound on the number of vertices of a graph. Proc. Am. Math. Soc. 72(1), 211–212 (1978)

    Article  MathSciNet  Google Scholar 

  25. Kargar, M., An, A.: Keyword search in graphs: Finding r-cliques. PVLDB 4 (10), 681–692 (2011)

    Google Scholar 

  26. Lappas, T., Liu, K., Terzi, E.: Finding a team of experts in social networks. In: KDD, pp. 467–476. ACM (2009)

  27. Lee, C., Reid, F., McDaid, A., Hurley, N.: Detecting highly overlapping community structure by greedy clique expansion. arXiv:1002.1827(2010)

  28. Li, Y., Zhao, Y., Wang, G., Zhu, F., Wu, Y., Shi, S.: Effective k-vertex connected component detection in large-scale networks. In: International Conference on Database Systems for Advanced Applications, pp. 404–421. Springer (2017)

  29. Li, L., Zheng, K., Wang, S., Hua, W., Zhou, X.: Go slow to go fast: Minimal on-road time route scheduling with parking facilities using historical trajectory. VLDB J. 27(3), 321–345 (2018)

    Article  Google Scholar 

  30. Lian, D., Zheng, K., Ge, Y., Cao, L., Chen, E., Xie, X.: Geomf++: Scalable location recommendation via joint geographical modeling and matrix factorization. ACM Trans. Inf. Syst. (TOIS) 36(3), 33 (2018)

    Article  Google Scholar 

  31. Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.G.: Linkscan*: Overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303. IEEE (2014)

  32. Liu, G., Liu, Y., Zheng, K., Liu, A., Li, Z., Wang, Y., Zhou, X.: Mcs-gpm: Multi-constrained simulation based graph pattern matching in contextual social graphs. IEEE Trans. Knowl. Data Eng. 30(6), 1050–1064 (2017)

    Article  Google Scholar 

  33. Mokken, R.J.: Cliques, clubs and clans. Quality & Quantity 13(2), 161–173 (1979)

    Article  Google Scholar 

  34. Molloy, M., Reed, B.: The size of the giant component of a random graph with a given degree sequence. Comb. Probab. Comput. 7(03), 295–305 (1998)

    Article  MathSciNet  Google Scholar 

  35. Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005)

    Article  Google Scholar 

  36. Pattillo, J., Youssef, N., Butenko, S.: On clique relaxation models in network analysis. Eur. J. Oper. Res. 226(1), 9–18 (2013)

    Article  MathSciNet  Google Scholar 

  37. Shan, J., Shen, D., Nie, T., Kou, Y., Yu, G.: Searching overlapping communities for group query. World Wide Web 19(6), 1179–1202 (2016)

    Article  Google Scholar 

  38. Slavin, T.P., Feng, T., Schnell, A., Zhu, X., Elston, R.C.: Two-marker association tests yield new disease associations for coronary artery disease and hypertension. Human Gen. 130(6), 725–733 (2011)

    Article  Google Scholar 

  39. Sozio, M., Gionis, A.: The community-search problem and how to plan a successful cocktail party. In: SIGKDD, pp. 939–948 (2010)

  40. Stoer, M., Wagner, F.: A simple min-cut algorithm. J. ACM (JACM) 44(4), 585–591 (1997)

    Article  MathSciNet  Google Scholar 

  41. Sun, H., Huang, J., Bai, Y., Zhao, Z., Jia, X., He, F., Li, Y.: Efficient k-edge connected component detection through an early merging and splitting strategy. Knowl.-Based Syst. 111, 63–72 (2016)

    Article  Google Scholar 

  42. Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)

    Google Scholar 

  43. Wang, N., Zhang, J., Tan, K.L., Tung, A.K.: On triangulation-based dense neighborhood graph discovery. PVLDB 4(2), 58–68 (2010)

    Google Scholar 

  44. Wang, Y., O’Connell, J.R., McArdle, P.F., Wade, J.B., Dorff, S.E., Shah, S.J., Shi, X., Pan, L., Rampersaud, E., Shen, H., et al.: Whole-genome association study identifies stk39 as a hypertension susceptibility gene. Proc. Natl. Acad. Sci. 106(1), 226–231 (2009)

    Article  Google Scholar 

  45. Wu, Y., Jin, R., Li, J., Zhang, X.: Robust local community detection: On free rider effect and its elimination. PVLDB 8(7), 798–809 (2015)

    Google Scholar 

  46. Wu, Y., Jin, R., Zhu, X., Zhang, X.: Finding dense and connected subgraphs in dual networks. In: ICDE, pp. 915–926 (2015)

  47. Wu, Y., Zhu, X., Li, L., Fan, W., Jin, R., Zhang, X.: Mining dual networks: Models, algorithms and applications. TKDD 10(4), 40 (2016)

    Article  Google Scholar 

  48. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: ICDM, pp. 745–754 (2012)

  49. Zeng, Z., Wang, J., Zhou, L., Karypis, G.: Coherent closed quasi-clique discovery from large dense graph databases. In: KDD, pp. 797–802 (2006)

  50. Zhao, Y., Zheng, K., Li, Y., Su, H., Liu, J., Zhou, X.: Destination-aware task assignment in spatial crowdsourcing: A worker decomposition approach. IEEE Transactions on Knowledge and Data Engineering (2019)

  51. Zheng, K., Zheng, Y., Yuan, N.J., Shang, S., Zhou, X.: Online discovery of gathering patterns over trajectories. IEEE Trans. Knowl. Data Eng. 26(8), 1974–1988 (2013)

    Article  Google Scholar 

  52. Zheng, B., Su, H., Hua, W., Zheng, K., Zhou, X., Li, G.: Efficient clue-based route search on road networks. IEEE Trans. Knowl. Data Eng. 29 (9), 1846–1859 (2017)

    Article  Google Scholar 

  53. Zheng, K., Zhao, Y., Lian, D., Zheng, B., Liu, G., Zhou, X.: Reference-based framework for spatio-temporal trajectory compression and query processing. IEEE Transactions on Knowledge and Data Engineering (2019)

  54. Zhou, R., Liu, C., Yu, J.X., Liang, W., Chen, B., Li, J.: Finding maximal k-edge-connected subgraphs from a large graph. In: EDBT, pp. 480–491 (2012)

Download references

Acknowledgments

This research is partially supported by the National NSFC (61672041, 61772124, 61732003,61902004,61977001), National Key Research and Development Program of China (2018YFB1004402), the Start-up Funds of North China University of Technology, and the National Research Foundation, Prime Ministers Office, Singapore under its International Research Centres in Singapore Funding Initiative and the Pinnacle lab for Analytics at SMU.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuan Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Graph Data Management in Online Social Networks

Guest Editors: Kai Zheng, Guanfeng Liu, Mehmet A. Orgun, and Junping Du

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Wang, G., Zhao, Y. et al. Towards k-vertex connected component discovery from large networks. World Wide Web 23, 799–830 (2020). https://doi.org/10.1007/s11280-019-00725-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-019-00725-6

Keywords

Navigation