Density-based clustering of big probabilistic graphs

Halim, Zahid; Khattak, Jamal Hussain

doi:10.1007/s12530-018-9223-2

Density-based clustering of big probabilistic graphs

Original Paper
Published: 22 March 2018

Volume 10, pages 333–350, (2019)
Cite this article

Evolving Systems Aims and scope Submit manuscript

770 Accesses
19 Citations
Explore all metrics

Abstract

Clustering is a machine learning task to group similar objects in coherent sets. These groups exhibit similar behavior with-in their cluster. With the exponential increase in the data volume, robust approaches are required to process and extract clusters. In addition to large volumes, datasets may have uncertainties due to the heterogeneity of the data sources, resulting in the Big Data. Modern approaches and algorithms in machine learning widely use probability-theory in order to determine the data uncertainty. Such huge uncertain data can be transformed to a probabilistic graph-based representation. This work presents an approach for density-based clustering of big probabilistic graphs. The proposed approach deals with clustering of large probabilistic graphs using the graph’s density, where the clustering process is guided by the nodes’ degree and the neighborhood information. The proposed approach is evaluated using seven real-world benchmark datasets, namely protein-to-protein interaction, yahoo, movie-lens, core, last.fm, delicious social bookmarking system, and epinions. These datasets are first transformed to a graph-based representation before applying the proposed clustering algorithm. The obtained results are evaluated using three cluster validation indices, namely Davies–Bouldin index, Dunn index, and Silhouette coefficient. This proposal is also compared with four state-of-the-art approaches for clustering large probabilistic graphs. The results obtained using seven datasets and three cluster validity indices suggest better performance of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning

Article 14 September 2020

Graph Clustering Via Intra-Cluster Density Maximization

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

Article 03 November 2021

Notes

References

AbdulAzeem YM, ElDesouky AI, Ali HA (2014) A framework for ranking uncertain distributed database. Data Knowl Eng 92:1–19
Article Google Scholar
Aggarwal CC, Reddy CK (eds) (2013) Data clustering: algorithms and applications. CRC Press, Taylor & Francis Group, Boca Raton
Google Scholar
Angelov PP, Gu X, Gutierrez G, Iglesias JA, Sanchis A (2016) Autonomous data density based clustering method. In international joint conference on neural networks (IJCNN), pp 2405–2413
Balakrishnan S, Xu M, Krishnamurthy A, Singh A (2011) Noise thresholds for spectral clustering. Adv Neural Inf Process Syst 2011:954–962
Google Scholar
Basharat A, Arpinar IB, Dastgheib S, Kursuncu U, Kochut K, Dogdu E (2014) Semantically enriched task and workflow automation in crowdsourcing for linked data management. Int J Semant Comput 8(04):415–439
Article Google Scholar
Bezerra CG, Costa BSJ, Guedes LA, Angelov PP (2016) A new evolving clustering algorithm for online data streams. In IEEE conference on evolving and adaptive intelligent systems, pp 162–168
Bonchi F, van Leeuwen M, Ukkonen A (2011) Characterizing uncertain data using compression. In proceedings of the 2011 SIAM international conference on data mining, pp 534–545
Chau M, Cheng R, Kao B, Ng J (2006) Uncertain data mining: an example in clustering location data. In Pacific–Asia conference on knowledge discovery and data mining, Springer, Berlin. pp 199–204
Chaudhuri K, Graham FC, Tsiatas A (2012) Spectral clustering of graphs with general degrees in the extended planted partition model. COLT 23:35–1
Google Scholar
Chen Y, Sanghavi S, Xu H (2012) Clustering sparse graphs. Adv Neural Inf Process Syst 2012:2204–2212
Google Scholar
Clémençon S, De Arazoza H, Rossi F, Tran VC (2012) Hierarchical clustering for graph visualization. arXiv:1210.5693 (preprint)
Cornish R (2007) Statistics: cluster analysis. Mathematics Learning Support Centre
Dahlin J, Svenson P (2011) A method for community detection in uncertain networks. In intelligence and security informatics conference (EISIC), pp 155–162
Du L, Li C, Chen H, Tan L, Zhang Y (2015) Probabilistic SimRank computation over uncertain graphs. Inf Sci 295:521–535
Article MathSciNet MATH Google Scholar
Gionis A, Mannila H, Tsaparas P (2007 Clustering aggregation. ACM Trans Knowl Discov Data (TKDD) 1(1):4
Article Google Scholar
Gu X, Angelov PP (2016) Autonomous data-driven clustering for live data stream. In IEEE international conference on systems, man, and cybernetics (SMC), pp 001128–001135
Gu Y, Gao C, Cong G, Yu G (2014) Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Trans Knowl Data Eng 26(5):1117–1130
Article Google Scholar
Gu X, Angelov PP, Kangin D, Principe JC (2017) A new type of distance metric and its use for clustering. Evol Syst 8(3):167–177
Article Google Scholar
Halim Z, Uzma (2017) Optimizing the minimum spanning tree-based extracted clusters using evolution strategy. Clust Comput 1–15
Halim Z, Waqas M, Hussain SF (2015) Clustering large probabilistic graphs using multi-population evolutionary algorithm. Inf Sci 317:78–95
Article Google Scholar
Halim Z, Waqas M, Baig AR, Rashid A (2017) Efficient clustering of large uncertain graphs using neighborhood information. Int J Approx Reason 90:274–291
Article MathSciNet MATH Google Scholar
Hintsanen P, Toivonen H (2008) Finding reliable subgraphs from large probabilistic graphs. Data Min Knowl Disc 17(1):3–23
Article MathSciNet Google Scholar
Hyde R, Angelov P, MacKenzie AR (2017) Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf Sci 382:96–114
Article Google Scholar
Jin P, Qu S, Zong Y, Li X (2014) CUDAP: a novel clustering algorithm for uncertain data based on approximate backbone. J Softw 9(3):732–737
Article Google Scholar
Karunambigai MG, Akram M, Sivasankar S, Palanivel K (2017) Clustering algorithm for intuitionistic fuzzy graphs. Int J Uncertain Fuzziness Knowl Based Syst 25(03):367–383
Article MathSciNet MATH Google Scholar
Khanmohammadi S, Adibeig N, Shanehbandy S (2017) An improved overlapping k-means clustering method for medical applications. Expert Syst Appl 67:12–18
Article Google Scholar
Kollios G, Potamias M, Terzi E (2013) Clustering large probabilistic graphs. IEEE Trans Knowl Data Eng 25(2):325–336
Article Google Scholar
Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Punna T (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084):637–643
Article Google Scholar
Langohr L, Toivonen H (2012) Finding representative nodes in probabilistic graphs. In: Bisociative Knowledge Discovery. Springer, Berlin Heidelberg, pp 218–229
Chapter Google Scholar
Li WP, Yang J, Zhang JP (2015) Uncertain canonical correlation analysis for multi-view feature extraction from uncertain data streams. Neurocomputing 149:1337–1347
Article Google Scholar
Liu L, Jin R, Aggarwal C, Shen Y (2012) Reliable clustering on uncertain graphs. In data mining (ICDM), 2012 IEEE 12th international conference on, pp 459–468
Liu HW, Chen L, Zhu H, Lu T, Liang F (2014) Uncertainty community detection in social networks. J Softw 9(4):1045–1050
Google Scholar
Mishra N, Schreiber R, Stanton I, Tarjan RE (2007) Clustering social networks. In international workshop on algorithms and models for the web-graph. Springer, Berlin, pp 56–67
Muhammad T, Halim Z (2016) Employing artificial neural networks for constructing metadata-based model to automatically select an appropriate data visualization technique. Appl Soft Comput 49:365–384
Article Google Scholar
Priyadarshini G, Sarmah R, Chakraborty B, Bhattacharyya DK, Kalita JK (2012) An effective graph-based clustering technique to identify coherent patterns from gene expression data. Int J Bioinform Res Appl 8(1–2):18–37
Article Google Scholar
Sarwar M, Akram M (2016) An algorithm for computing certain metrics in intuitionistic fuzzy graphs. J Intell Fuzzy Syst 30(4):2405–2416
Article MATH Google Scholar
Sarwar M, Akram M (2017) Certain algorithms for computing strength of competition in bipolar fuzzy graphs. Int J Uncertain Fuzziness Knowl Based Syst 25(06):877–896
Article MathSciNet Google Scholar
Satuluri V, Parthasarathy S (2011 Symmetrizations for clustering directed graphs. In proceedings of the 14th international conference on extending database technology. pp 343–354
Schubert E, Koos A, Emrich T, Züfle A, Schmid KA, Zimek A (2015) A framework for clustering uncertain data. Proc VLDB Endow 8(12):1976–1979
Article Google Scholar
Shah MA, Abbas G, Dogar AB, Halim Z (2015) Scaling hierarchical clustering and energy aware routing for sensor networks. Complex Adapt Syst Model 3(1):5
Article Google Scholar
Xu H, Li G (2008) Density-based probabilistic clustering of uncertain data. In computer science and software engineering, 2008 international conference on, pp 4,474–477
Xu L, Hu Q, Hung E, Chen B, Tan X, Liao C (2015) Large margin clustering on uncertain data by considering probability distribution similarity. Neurocomputing 158:81–89
Article Google Scholar
Zhang X, Liu H, Zhang X, Liu X (2014) Novel density-based clustering algorithms for uncertain data. In: Proceedings of the twenty-eighth conference on artificial intelligence, pp 2191–2197
Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their helpful and constructive comments that greatly contributed to improve the manuscript quality.

Author information

Authors and Affiliations

Faculty of Computer Science and Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi, Pakistan
Zahid Halim & Jamal Hussain Khattak
Business Solutions and Development, Information Technology Group, Allied Bank Limited, Lahore, Pakistan
Jamal Hussain Khattak

Authors

Zahid Halim
View author publications
You can also search for this author in PubMed Google Scholar
Jamal Hussain Khattak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zahid Halim.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Halim, Z., Khattak, J.H. Density-based clustering of big probabilistic graphs. Evolving Systems 10, 333–350 (2019). https://doi.org/10.1007/s12530-018-9223-2

Download citation

Received: 10 June 2017
Accepted: 14 March 2018
Published: 22 March 2018
Issue Date: September 2019
DOI: https://doi.org/10.1007/s12530-018-9223-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-based clustering of big probabilistic graphs

Abstract

Access this article

Similar content being viewed by others

Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning

Graph Clustering Via Intra-Cluster Density Maximization

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Density-based clustering of big probabilistic graphs

Abstract

Access this article

Similar content being viewed by others

Ensemble-based clustering of large probabilistic graphs using neighborhood and distance metric learning

Graph Clustering Via Intra-Cluster Density Maximization

Density Peaks Clustering Based on Jaccard Similarity and Label Propagation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation