Abstract
Aiming at the problem of over-sampling for high-degree nodes and low-degree nodes in current sampling algorithms, a node Neighborhood Clustering coefficient Hierarchical Random Walk (NCHRW) sampling method is proposed. Firstly, the idea of hierarchy and degree distribution are adopted, and the k-means clustering algorithm is used to determine the value of the number of layers; secondly, combining the accuracy degree distribution to determine the boundary value between each hierarchical network; thirdly, sampling is carried out not only by taking the degree of the current node, the number of common neighbors between the current node and its neighbors, but the clustering coefficient of these neighbors into consideration at each layer. Finally, on eight real networks and one synthetic network, NCHRW and existing algorithms are compared from six aspects of degree distribution, density, average degree, average clustering coefficient, transitivity and sampling network visualization. The results show that the proposed NCHRW method is significantly better than other nine traditional sampling algorithms in terms of degree distribution, density and average degree, the topology properties of the network can be preserved very well.
Similar content being viewed by others
References
Hu, P., W. C. Lau.: A survey and taxonomy of graph sampling. https://arxiv.org/abs/1308.5865 [cs.SI] (2013)
Gjoka, M., et al.: Multigraph sampling of online social networks. IEEE J. Sel. Areas Commun. 29(9), 1893–1905 (2011). https://doi.org/10.1109/JSAC.2011.111012
Volz, E.M., Heckathorn, D.D.: Probability based estimation theory for respondent driven sampling. Qual. Eng. 53, 559–560 (2008)
Papagelis, M., Das, G., Koudas, N.: Sampling online social networks. IEEE Trans. Knowl. Data Eng. 25(3), 662–676 (2013). https://doi.org/10.1109/TKDE.2011.254
Krishnamurthy, V., et al.: Reducing Large Internet Topologies for Faster Simulations. Springer, Berlin, Heidelberg. 328–341(2005). https://doi.org/10.1007/11422778_27
Doerr C., Blenn. B.: Metric convergence in social network sampling. In Proceedings of the 5th ACM workshop on HotPlanet. 45–50 (2013). https://doi.org/10.1145/2491159.2491168
Gjoka, M., Kurant, M., Butts, C. T.: Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In: 2010 Proceedings IEEE Infocom,IEEE,1–9. https://doi.org/10.1109/INFCOM.2010.5462078
Hübler, C., et al.: Metropolis Algorithms for Representative Subgraph Sampling. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 283–292 (2008). https://doi.org/10.1109/ICDM.2008.124
Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 631–636 (2006). https://doi.org/10.1145/1150402.1150479
NESREEN, et al.: Network Sampling: From Static to Streaming Graphs. Acm Transactions on Knowledge Discovery from Data. 8(2), 1–56 (2013). https://doi.org/10.1145/2601438
Zhao, J., Wang, P., Lui, J., Don, T., et al.: Sampling online social networks by random walk with indirect jumps. Data Min. Knowl. Disc. 33, 24–57 (2019). https://doi.org/10.1007/s10618-018-0587-5
Wagner, C., Singer, P., Karimi, F., Pfeffer, J., Strohmaier, M.: Sampling from Social Networks with Attributes. In: Conference www'17 Proceedings of the 26th International Conference on World Wide Wep. pp. 1181–1190 (2017). https://doi.org/10.1145/3038912.3052665
Hasan, M. A.: Methods and Applications of Network Sampling. SIAM Conference on Data Mining. 115–139 (2016). https://doi.org/10.1287/educ.2016.0147
Rezvanian, A., Meybodi, M. R.: Sarmpling algorithms for weighted networks. Social Network Analysis &. Mining. 6(1), 1–22 (2016). https://doi.org/10.1007/s13278-016-0371-8
Voudigari, E., Salmanos, N., Papageorgiou, T., et al.: Rank degree: An efficient algorithm for graph sampling. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). pp. 120–129 (2016). https://doi.org/10.1109/ASONAM.2016.7752223.
Zhao, J., Lui, J., Towsley, D., Wang, P., Guan, X.: A tale of three graphs: sampling design on hybrid social-affiliation networks. Proceedings of IEEE ICDE (2015). https://doi.org/10.1109/ICDE.2015.7113346
Cui, Y.A., et al.: A comparison on methodologies of sampling online social media. Chin. J. Comput. 37(8), 1859–1876 (2014). https://doi.org/10.3724/SP.J.1016.2014.01859
Tang, J., Wang, T., Ji, W.: Shortest path approximate algorithm for complex network analysis. J. Softw. 22(10), 2279–2290 (2011). https://doi.org/10.3724/SP.J.1001.2011.03924
Ahmed, N.K., Berchmans, F., et al.: Time-based sampling of social network activity graphs. In: Proceedings of the Eighth Workshop on Mining and Learning with Graphs, 2010, pp. 1–9. https://doi.org/10.1145/1830252.1830253
Ahmed, N.K., Neville, J., Kompella, R.: Space-efficient sampling from social activity streams. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, 2012, pp. 53–60. https://arxiv.org/abs/1206.4952 [cs.SI] (2012)
Kurant, M., Gjoka, M., Butts, C.T., Markopoulou, A.: Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2011, pp. 281–292. https://doi.org/10.1145/1993744.1993773
Li, Y., Wu, Z., Lin, S., Xie, H., Lv, M., Xu, Y., et al. Walking with Perception: Efficient Random Walk Sampling via Common Neighbor Awareness. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019, pp. 962–973. https://doi.org/10.1109/ICDE.2019.00090
Rezvanian, A., Moradabadi, B., Ghavipour, M., Daliri Khomami, M.M., Meybodi, M.R.: Social Network Sampling. In: Learning Automata Approach for Social Networks. 820: 91–149 (2019). https://doi.org/10.1007/978-3-030-10767-3_4
Rezvanian, A., Meybodi, M.R.: Sampling algorithms for stochastic graphs: a learning automata approach. Knowl.-Based Syst. 127, 126–144 (2017). https://doi.org/10.1016/j.knosys.2017.04.012
Ghavipour, M., Meybodi, M.R.: A dynamic sampling algorithm based on learning automata for stochastic trust networks. Knowl.-Based Syst. 212, 106620 (2021). https://doi.org/10.1016/j.knosys.2020.106620.(ISSN0950-7051)
Lin, M.-K., Li, W.-Z., et al.: SAKE: estimating katz centrality based on sampling for large-scale social networks. ACM Trans. Knowl. Discov. Data. 15(4), 1–21 (2021). https://doi.org/10.1145/3441646
Du, X.-L., Wang, D., et al.: SGP: a social network sampling method based on graph partition. Int J Inform Technol Manag Indersci Enterprises Ltd. 18(2/3), 227–242 (2019). https://doi.org/10.1145/3441646
Chen, J., Gong, Z., Wang, W., Liu, W.: HNS: hierarchical negative sampling for network representation learning. Inf. Sci. 542, 343–356 (2021). https://doi.org/10.1016/j.ins.2020.07.015
Hong, C., et al.: GL2vec: Graph Embedding Enriched by Line Graphs with Edge Features. In: International Conference on Neural Information Processing. Springer, 2019, pp. 3–14. https://doi.org/10.1007/978-3-030-36718-3_1
Hamilton, W. L., Ying, R., Leskovec, J.: Representation learning on graphs: methods and applications. https://arxiv.org/abs/1709.05584 [Cs.SI] (2018)
Rozemberczki, B., Allen, C., Sarkar, R.: Multi-Scale Attributed Node Embedding. 9(2), 2051–1329 (2021). https://doi.org/10.1093/comnet/cnab014
Rozemberczki, B., Davies, R., Sarkar, R., Sutton, C.: GEM-SEC: Graph Embedding with Self Clustering. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2019, pp. 65–72. https://doi.org/10.1145/3341161.3342890
Rozemberczki, B., Kiss, O., Sarkar, R.: Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), 2020, pp. 3125–3132. https://doi.org/10.1145/3340531.3412757
Papagelis, M.: Refining social graph connectivity via shortcut edge addition. ACM Trans. Knowl. Discov. Data 10(2), 1–35 (2015). https://doi.org/10.1145/2757281
Rozemberczki, B., Sarkar, R.: Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models. In: Proceedings of the 29th ACM International on Conference on Information and Knowledge Management (CIKM ’20), 2020, pp. 1325–1334. https://doi.org/10.1145/3340531.3411866
Rozemberczki, B., Davies, R., Sarkar, R., Sutton, C.: GEMSEC: Graph Embedding with Self Clustering. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2019, pp. 65–72. https://doi.org/10.1145/3341161.3342890
Lz, A., Hong, J.B., Fang, W.A., Dan, F.A.: DRaWS: A dual random-walk based sampling method to efficiently estimate distributions of degree and clique size over social networks. Knowl.-Based Syst. 198, 10–21 (2020). https://doi.org/10.1016/j.knosys.2020.105891
Goldstein, M.L., Morris, S.A., Yen, G.G.: Problems with fitting to the power-law distribution. Phys. Condensed Matter. 41(2), 255–258 (2004). https://doi.org/10.1140/epjb/e2004-00316-5
Lawyer, G.: Understanding the influence of all nodes in a network. Sci Rep 5, 8665 (2015). https://doi.org/10.1038/srep08665
Scott, E., Stephen, K., Mike, G., Katy, B., Constantine, D.: Analysis of network clustering algorithms and cluster quality metrics at scale. PLoS ONE 11(7), e0159161 (2017). https://doi.org/10.1371/journal.pone.0159161
Wu, Y., Cao, N., Archambault, D., Shen, Q., Qu, H., Cui, W.: Evaluation of Graph Sampling: A Visualization Perspective. IEEE Transactions on Visualization and Computer Graphics (InfoVis 2016), 23(1), 401–410 (2017). http://dx.doi.org/https://doi.org/10.1109/TVCG.2016.2598867
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. (2008). https://doi.org/10.1088/1742-5468/2008/10/P10008
Laishui, et al.: PageRank centrality for temporal networks. Physics Letters A. 383(12), 1215–1222 (2019). https://doi.org/10.1016/j.physleta.2019.01.041
Suvarna, Mashrin, B.J., Pankaj S.: PageRank Algorithm using Eigenvector Centrality. https://arxiv.org/abs/2201.05469 [cs.SI] (2022)
Britta, R.: Eigenvector-centrality - a node-centrality? Social Networks. 22(4), 357–365 (2000). https://doi.org/10.1016/S0378-8733(00)00031-9
Bihari, A., Pandia, M. K.: Eigenvector centrality and its application in research professionals' relationship network. In: International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), 2015, pp. 510–514, https://doi.org/10.1109/ABLAZE.2015.7154915.
Acknowledgements
This work is supported in part by Science and Technology Research Project of Chongqing Municipal Education Commission (KJZD-K202001101), Chongqing Ba-nan District Science and Technology Bureau Science and Technology Talents Special Project (2020.58), General Project of Chongqing Natural Science Foundation (cstc2021jcyj-msxmX0162), 2021 National Education Examination Research Project (GJK2021028), 2020 Chongqing Municipal Human Resources and Social Security Bureau of Innovation Project for Returned Overseas Person (cx2020031), 2020 National Statistical Science Research Project (2020412).
Author information
Authors and Affiliations
Contributions
XL: conceptualization, software, methodology, validation, data curation, writing-review & editing. MZ: methodology, formal analysis, writing-original draft, data curation, writing review & editing. GF: methodology, writing-review & editing. PDM: methodology, writing-review & editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Liu, X., Zhang, M., Fiumara, G. et al. Complex Network Hierarchical Sampling Method Combining Node Neighborhood Clustering Coefficient with Random Walk. New Gener. Comput. 40, 765–807 (2022). https://doi.org/10.1007/s00354-022-00179-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00354-022-00179-x