Abstract
In many real life network-based applications such as social relation analysis, Web analysis, collaborative network, road network and bioinformatics, the discovery of components with high connectivity is an important problem. In particular, k-edge connected component (k-ECC) has recently been extensively studied to discover disjoint components. Yet many real scenarios present more needs and challenges for overlapping components. In this paper, we propose a k-vertex connected component (k-VCC) model, which is much more cohesive, and thus supports overlapping between components very well. To discover k-VCCs, we propose three frameworks including top-down, bottom-up and hybrid frameworks. The top-down framework is first developed to find the exact k-VCCs by dividing the whole network. To further reduce the high computational cost for input networks of large sizes, a bottom-up framework is then proposed to locally identify the seed subgraphs, and obtain the heuristic k-VCCs by expanding and merging these seed subgraphs. Finally, the hybrid framework takes advantages of the above two frameworks. It exploits the results of bottom-up framework to construct the well-designed mixed graph and then discover the exact k-VCCs by contracting the mixed graph in a top-down way. Because the size of mixed graph is smaller than the original network, the hybrid framework runs much faster than the top-down framework. Comprehensive experimental are conducted on large real and synthetic networks and demonstrate the efficiency and effectiveness of the proposed exact and heuristic approaches.
Similar content being viewed by others
References
Adamcsek, B., Palla, G., Farkas, I., Derényi, I., Vicsek, T.: Cfinder: Locating cliques and overlapping modules in biological networks. Bioinformatics 22(8), 1021–1023 (2006)
Akiba, T., Iwata, Y., Yoshida, Y.: Linear-time enumeration of maximal k-edge-connected subgraphs in large networks by random contraction. In: CIKM, pp. 909–918 (2013)
Batagelj, V., Zaversnik, M.: An o (m) algorithm for cores decomposition of networks. arXiv:cs/0310049 (2003)
Berlowitz, D., Cohen, S., Kimelfeld, B.: Efficient enumeration of maximal k-plexes. In: SIGMOD, pp. 431–444 (2015)
Böger, C.A., Chen, M.H., Tin, A., Olden, M., Köttgen, A., de Boer, I.H., Fuchsberger, C., O’Seaghdha, C.M., Pattaro, C., Teumer, A., et al: Cubn is a gene locus for albuminuria. J. Am. Soc. Nephrol. 22(3), 555–570 (2011)
Chang, L., Yu, J.X., Qin, L., Lin, X., Liu, C., Liang, W.: Efficiently computing k-edge connected components via graph decomposition. In: SIGMOD, pp. 205–216 (2013)
Chang, L., Lin, X., Qin, L., Yu, J.X., Zhang, W.: Index-based optimal algorithms for computing steiner components with maximum connectivity. In: SIGMOD, pp. 459–474. ACM (2015)
Chen, L.Y., Zhao, W.H., Tian, W., Guo, J., Jiang, F., Jin, L.J., Sun, Y.X., Chen, K.M., An, L.L., Li, G., et al: Stk39 is an independent risk factor for male hypertension in Han Chinese. Int. J. Cardiol. 154(2), 122–127 (2012)
Cheng, J., Ke, Y., Chu, S., Özsu, M. T.: Efficient core decomposition in massive networks. In: ICDE, pp. 51–62 (2011)
Christophides, V., Karvounarakis, G., Plexousakis, D., Scholl, M., Tourtounis, S.: Optimizing taxonomic semantic Web queries using labeling schemes. Web Semantics: Science Services and Agents on the World Wide Web 1(2), 207–228 (2004)
Cohen, J.: Trusses: Cohesive subgraphs for social network analysis. National Security Agency Technical Report 16 (2008)
Consortium, W.T.C.C., et al.: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145), 661 (2007)
Conte, A., Firmani, D., Mordente, C., Patrignani, M., Torlone, R.: Fast enumeration of large k-plexes. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 115–124. ACM (2017)
Cui, W., Xiao, Y., Wang, H., Wang, W.: Local search of communities in large graphs. In: SIGMOD, pp. 991–1002 (2014)
Diestel, R.: Graph theory. Grad Texts in Math (2005)
Esfahanian, A.H., Louis Hakimi, S.: On computing the connectivities of graphs and digraphs. Networks 14(2), 355–366 (1984)
Even, S., Tarjan, R.E.: Network flow and testing graph connectivity. SIAM J. Comput. 4(4), 507–518 (1975)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3), 75–174 (2010)
Gregory, S.: Finding overlapping communities in networks by label propagation. J. Phys. 12(10), 103018 (2010)
Hariharan, R., Kavitha, T., Panigrahi, D., Bhalgat, A.: An o (mn) gomory-hu tree construction algorithm for unweighted graphs. In: ACM Symposium on Theory of Computing, pp. 605–614 (2007)
Hu, J., Wu, X., Cheng, R., Luo, S., Fang, Y.: Querying minimal steiner maximum-connected subgraphs in large graphs. In: CIKM, pp. 1241–1250. ACM (2016)
Huang, X., Cheng, H., Qin, L., Tian, W., Yu, J.X.: Querying k-truss community in large and dynamic graphs. In: SIGMOD, pp. 1311–1322 (2014)
Huang, X., Lu, W., Lakshmanan, L.V.: Truss decomposition of probabilistic graphs: Semantics and algorithms. In: ACM Proc. of SIGMOD, pp. 77–90. ACM (2016)
Kane, V., Mohanty, S.: A lower bound on the number of vertices of a graph. Proc. Am. Math. Soc. 72(1), 211–212 (1978)
Kargar, M., An, A.: Keyword search in graphs: Finding r-cliques. PVLDB 4 (10), 681–692 (2011)
Lappas, T., Liu, K., Terzi, E.: Finding a team of experts in social networks. In: KDD, pp. 467–476. ACM (2009)
Lee, C., Reid, F., McDaid, A., Hurley, N.: Detecting highly overlapping community structure by greedy clique expansion. arXiv:1002.1827(2010)
Li, Y., Zhao, Y., Wang, G., Zhu, F., Wu, Y., Shi, S.: Effective k-vertex connected component detection in large-scale networks. In: International Conference on Database Systems for Advanced Applications, pp. 404–421. Springer (2017)
Li, L., Zheng, K., Wang, S., Hua, W., Zhou, X.: Go slow to go fast: Minimal on-road time route scheduling with parking facilities using historical trajectory. VLDB J. 27(3), 321–345 (2018)
Lian, D., Zheng, K., Ge, Y., Cao, L., Chen, E., Xie, X.: Geomf++: Scalable location recommendation via joint geographical modeling and matrix factorization. ACM Trans. Inf. Syst. (TOIS) 36(3), 33 (2018)
Lim, S., Ryu, S., Kwon, S., Jung, K., Lee, J.G.: Linkscan*: Overlapping community detection using the link-space transformation. In: ICDE, pp. 292–303. IEEE (2014)
Liu, G., Liu, Y., Zheng, K., Liu, A., Li, Z., Wang, Y., Zhou, X.: Mcs-gpm: Multi-constrained simulation based graph pattern matching in contextual social graphs. IEEE Trans. Knowl. Data Eng. 30(6), 1050–1064 (2017)
Mokken, R.J.: Cliques, clubs and clans. Quality & Quantity 13(2), 161–173 (1979)
Molloy, M., Reed, B.: The size of the giant component of a random graph with a given degree sequence. Comb. Probab. Comput. 7(03), 295–305 (1998)
Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005)
Pattillo, J., Youssef, N., Butenko, S.: On clique relaxation models in network analysis. Eur. J. Oper. Res. 226(1), 9–18 (2013)
Shan, J., Shen, D., Nie, T., Kou, Y., Yu, G.: Searching overlapping communities for group query. World Wide Web 19(6), 1179–1202 (2016)
Slavin, T.P., Feng, T., Schnell, A., Zhu, X., Elston, R.C.: Two-marker association tests yield new disease associations for coronary artery disease and hypertension. Human Gen. 130(6), 725–733 (2011)
Sozio, M., Gionis, A.: The community-search problem and how to plan a successful cocktail party. In: SIGKDD, pp. 939–948 (2010)
Stoer, M., Wagner, F.: A simple min-cut algorithm. J. ACM (JACM) 44(4), 585–591 (1997)
Sun, H., Huang, J., Bai, Y., Zhao, Z., Jia, X., He, F., Li, Y.: Efficient k-edge connected component detection through an early merging and splitting strategy. Knowl.-Based Syst. 111, 63–72 (2016)
Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)
Wang, N., Zhang, J., Tan, K.L., Tung, A.K.: On triangulation-based dense neighborhood graph discovery. PVLDB 4(2), 58–68 (2010)
Wang, Y., O’Connell, J.R., McArdle, P.F., Wade, J.B., Dorff, S.E., Shah, S.J., Shi, X., Pan, L., Rampersaud, E., Shen, H., et al.: Whole-genome association study identifies stk39 as a hypertension susceptibility gene. Proc. Natl. Acad. Sci. 106(1), 226–231 (2009)
Wu, Y., Jin, R., Li, J., Zhang, X.: Robust local community detection: On free rider effect and its elimination. PVLDB 8(7), 798–809 (2015)
Wu, Y., Jin, R., Zhu, X., Zhang, X.: Finding dense and connected subgraphs in dual networks. In: ICDE, pp. 915–926 (2015)
Wu, Y., Zhu, X., Li, L., Fan, W., Jin, R., Zhang, X.: Mining dual networks: Models, algorithms and applications. TKDD 10(4), 40 (2016)
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: ICDM, pp. 745–754 (2012)
Zeng, Z., Wang, J., Zhou, L., Karypis, G.: Coherent closed quasi-clique discovery from large dense graph databases. In: KDD, pp. 797–802 (2006)
Zhao, Y., Zheng, K., Li, Y., Su, H., Liu, J., Zhou, X.: Destination-aware task assignment in spatial crowdsourcing: A worker decomposition approach. IEEE Transactions on Knowledge and Data Engineering (2019)
Zheng, K., Zheng, Y., Yuan, N.J., Shang, S., Zhou, X.: Online discovery of gathering patterns over trajectories. IEEE Trans. Knowl. Data Eng. 26(8), 1974–1988 (2013)
Zheng, B., Su, H., Hua, W., Zheng, K., Zhou, X., Li, G.: Efficient clue-based route search on road networks. IEEE Trans. Knowl. Data Eng. 29 (9), 1846–1859 (2017)
Zheng, K., Zhao, Y., Lian, D., Zheng, B., Liu, G., Zhou, X.: Reference-based framework for spatio-temporal trajectory compression and query processing. IEEE Transactions on Knowledge and Data Engineering (2019)
Zhou, R., Liu, C., Yu, J.X., Liang, W., Chen, B., Li, J.: Finding maximal k-edge-connected subgraphs from a large graph. In: EDBT, pp. 480–491 (2012)
Acknowledgments
This research is partially supported by the National NSFC (61672041, 61772124, 61732003,61902004,61977001), National Key Research and Development Program of China (2018YFB1004402), the Start-up Funds of North China University of Technology, and the National Research Foundation, Prime Ministers Office, Singapore under its International Research Centres in Singapore Funding Initiative and the Pinnacle lab for Analytics at SMU.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Graph Data Management in Online Social Networks
Guest Editors: Kai Zheng, Guanfeng Liu, Mehmet A. Orgun, and Junping Du
Rights and permissions
About this article
Cite this article
Li, Y., Wang, G., Zhao, Y. et al. Towards k-vertex connected component discovery from large networks. World Wide Web 23, 799–830 (2020). https://doi.org/10.1007/s11280-019-00725-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-019-00725-6