Hybrid visual computing models to discover the clusters assessment of high dimensional big data

Suleman Basha, M.; Mouleeswaran, S. K.; Rajendra Prasad, K.

doi:10.1007/s00500-022-07092-x

Hybrid visual computing models to discover the clusters assessment of high dimensional big data

Focus
Published: 23 April 2022

Volume 27, pages 4249–4262, (2023)
Cite this article

Soft Computing Aims and scope Submit manuscript

M. Suleman Basha¹,
S. K. Mouleeswaran¹ &
K. Rajendra Prasad²

125 Accesses
1 Citation
Explore all metrics

Abstract

Clusters assessment is a major identified problem in big data clustering. Top big data partitioning techniques, such as, spherical k-means, Mini-batch-k-means are widely used in many large data applications. However, they need prior information about the clusters assessment to discover the quality of clusters over the big data. Existing visual models, namely, clustering with improved visual assessment of tendency, and sample viewpoints cosine-based similarity VAT (SVPCS-VAT), efficiently perform the clusters assessment of big data. For the high-dimensional big data, the SVPCS-VAT is enhanced with the subspace learning techniques, principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projection (LPP), Neighborhood preserving embedding (NPE). These are used to develop hybrid visual computing models, including PCA-based SVPCS-VAT, LDA-based SVPCS-VAT, and LPP-based SVPCS-VAT, NPE-based SVPCS-VAT to overcome the curse of dimensionality problem. Experimental is conducted on benchmarked datasets to demonstrate and compare the efficiency with the state-of-the-art big data clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An extended visual methods to perform data cluster assessment in distributed data systems

Article 13 January 2022

An enhanced visual approach for accessing the clustering tendency of big data

Article 15 March 2021

PROFIT: A Projected Clustering Technique

Data availability

Enquiries about data availability should be directed to the authors.

References

Achlioptas D (2001) Database-friendly random projections. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 274–281
Alessia Amelio, Clara Pizzuti (2015) Is normalized mutual information a fair measure for comparing community detection methods? In: IEEE/ACM international conference on advances in social networks analysis and mining
Asuncion A, Newman D (2007) Uci machine learning repository
Belkin M, Niyogi P (2008) Towards a theoretical foundation for Laplacian-based manifold methods. J Comput Syst Sci 74(8):1289–1308
Article MathSciNet MATH Google Scholar
Bezdek J (1981) Pattern recognition with objective function algorithms. Plenum, New York, NY, USA
Book MATH Google Scholar
Bezdek JL (2008) SpecVAT: enhanced visual cluster analysis. IEEE international conference on data mining, ICDM
Bezdek JC, Hathaway RJ (2002) VAT: a tool for visual assessment of (cluster) tendency. In Proceedings of. 2002 international joint conference on neural networks, Honolulu, HI, 2002, 2225–2230
Bhatnagar V, Majhi R, Jena PR (2018) Comparative performance evaluation of clustering algorithms for grouping manufacturing frms. Arab J Sci Eng 43:4071–4083
Article Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Bradley PS, Fayyad UM, Reina C et al (1998) Scaling clustering algorithms to large databases. KDD, pp 9–15
Deepak V, Khanna MR, Dhanasekaran K, Prakash PGO, Babu DV (2021) An efficient performance analysis using collaborative recommendation system on big data. In: 2021 5th international conference on trends in electronics and informatics (ICOEI), pp 1386–1392. https://doi.org/10.1109/ICOEI51242.2021.9452737
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York
MATH Google Scholar
Havens TC, Bezdek JC (2012) An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans Knowl Data Eng 24(5):813–822
Article Google Scholar
Havens TC, Bezdek JC, Keller JM, Popescu M, Huband JM (2009) Is VAT really single linkage in disguise? Ann Math Artif Intell 55(3–4):237–251
Article MathSciNet MATH Google Scholar
Hore P, Hall L, Goldgof D (2007) Single pass fuzzy C means. In: Proceedings of IEEE international Fuzzy system conference, London, UK, pp 1–7
Hu Y, John A, Wang F, Kambhampati S (2012) Et-LDA: joint topic modelling for aligning events and their twitter feedback. In: AAAI conference on artificial intelligence (AAAI 2012), Vol 12, Toronto, Ontario, Canada, pp 59–65
Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386
Article Google Scholar
Xudong Jiang, Linear Subspace learning based dimensionality reduction, IEEE Signal Processing Magazine, 2011
Kumar D, Bezdek JC, Palaniswami M, Rajasegarar S, Leckie C, Havens TC (2016) A hybrid approach to clustering in big data. IEEE Trans Cybern 46(10):2372–2385. https://doi.org/10.1109/TCYB.2015.2477416
Article Google Scholar
Kumar D, Palaniswami M, Rajasegarar S, Leckie C, Bezdek JC, Havens TC (2013) clusiVAT: a mixed visual/numerical clustering algorithm for big data. In: 2013 IEEE international conference on big data, Silicon Valley, CA, pp 112–117. https://doi.org/10.1109/BigData.2013.6691561
Pattanodom et al. (2016) Clustering data with the presence of missing values by ensemble approach. In: Second Asian conference on defense technology
LeCun Y, Cortes C, Burges CJ (1998) The mnist dataset of handwritten digits. http://yann.lecun.com/exdb/mnist
Rajendra Prasad K, Reddy BE, Mohammed M (2021) An effective assessment of cluster tendency through sampling based multi-viewpoints visual method. J Amb Intell Human Comput. https://doi.org/10.1007/s12652-020-02710-8
Article Google Scholar
Rajendra Prasad K, Suleman Basha M (2016) Improving the performance of speech clustering method. In: IEEE 10th international conference on intelligent systems and control (ISCO)
Rajendra Prasad K, Mohammed M, Noorullah RM (2019) Visual topic models for healthcare data clustering. Evol Intell
Ramathilagam S, Devi R, Kannan SR (2013) Extended fuzzy c-means: an analyzing data clustering problems. Cluster Comput
Rathore P, Kumar D, Bezdek JC, Rajasegarar S, Palaniswami M (2019) A rapid hybrid clustering algorithm for large volumes of high dimensional data. In: IEEE transactions on knowledge and data engineering 31(4): 641–654. https://doi.org/10.1109/TKDE.2018.2842191
Rui X, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar
Sculley D (2010) Web-scale k-means clustering. In: Proceedings of the 19th international conference on world wide web. ACM, pp 1177–1178
Subba Reddy K, Rajendra Prasad K, Kamatam GR et al (2022) An extended visual methods to perform data cluster assessment in distributed data systems. J Supercomput. https://doi.org/10.1007/s11227-021-04243-z
Article Google Scholar
Suleman Basha M, Mouleeswaran SK, Rajendra Prasad K (2021) Sampling-based visual assessment computing techniques for an efficient social data clustering. J Supercomput 77:8013–8037. https://doi.org/10.1007/s11227-021-03618-6
Article Google Scholar
Suleman Basha M, Mouleeswaran SK, Rajendra Prasad K (2019) Cluster tendency methods for visualizing the data partitions. Int J Innov Technol Explor Eng
Tavallaee M, Bagheri E, Lu W, Ghorbani A (2009) A detailed analysis of the KDD’99 CUP data set. In: Proceedings of 2nd IEEE symposium on computer intelligence conference on security defense applications (CISDA), Vol 40, Ottawa, ON, Canada, pp 44–47
Urruty T, Djeraba C, Simovici DA (2007) Clustering by random projections. In: Industrial conference on data mining. Springer, pp 107–119
Vidal R, Ma Y, Sastry S (2005) Generalized principal component analysis (GPCA). IEEE Trans Pattern Anal Machine Intell 27(12):1945–1959
Article Google Scholar
Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining, knowledge information system, vol 14. Springer, Heidelberg, pp 1–37
Google Scholar
Yang Y, Ma Z, Yang Y, Nie F, Shen HT (2015) Multitask spectral clustering by exploring intertask correlation. IEEE Trans Cybern 45(5):1069–1080
Article Google Scholar

Download references

Funding

There is No funding support for this work.

Author information

Authors and Affiliations

Department of CSE, Dayananda Sagar University, Bangalore, India
M. Suleman Basha & S. K. Mouleeswaran
Department of CSE, RGM College of Engineering and Technology, Nandyal, India
K. Rajendra Prasad

Authors

M. Suleman Basha
View author publications
You can also search for this author in PubMed Google Scholar
S. K. Mouleeswaran
View author publications
You can also search for this author in PubMed Google Scholar
K. Rajendra Prasad
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MSB and SKM contributed toward designing hybrid visual computing models. MSB has collected the related study data of visual techniques for clusters assessment problems. KRP carried out data analysis and interpretation of clustering results analysis with indicate measures. He performed the critical investigations of the work in the experimental. MSB wrote the paper with the advice of other authors, and SM took the revision for the quality of the paper. MSB, SKM, KRP: Conceptualization; MSB, SKM: Data curation; MSB, SKM, KRP: Formal analysis; KRP: Funding acquisition, Funding—“Science and Engineering Research Board (SERB)” – Grant of DST (Department of Science and Technology), Government of India, Sanctioned File Number-ECR/2016/001556 MSB, SKM, KRP: Investigation; MSB, SKM, KRP: Three New Methods are developed they are, PCA-based SVPCS-VAT, LDA-based SVPCS-VAT, and LPP-based SVPCS-VAT; SKM: Project administration; MSB, RP: Resources; SK, KRP: Supervision; SB: Visualization; MSB, KRP: Writing—original draft; SKM: Writing—review and editing.

Corresponding author

Correspondence to M. Suleman Basha.

Ethics declarations

Conflict of interest

There is No conflict of interest from any side.

Additional information

Communicated by V Suma.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suleman Basha, M., Mouleeswaran, S.K. & Rajendra Prasad, K. Hybrid visual computing models to discover the clusters assessment of high dimensional big data. Soft Comput 27, 4249–4262 (2023). https://doi.org/10.1007/s00500-022-07092-x

Download citation

Accepted: 25 March 2022
Published: 23 April 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00500-022-07092-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid visual computing models to discover the clusters assessment of high dimensional big data

Abstract

Access this article

Similar content being viewed by others

An extended visual methods to perform data cluster assessment in distributed data systems

An enhanced visual approach for accessing the clustering tendency of big data

PROFIT: A Projected Clustering Technique

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid visual computing models to discover the clusters assessment of high dimensional big data

Abstract

Access this article

Similar content being viewed by others

An extended visual methods to perform data cluster assessment in distributed data systems

An enhanced visual approach for accessing the clustering tendency of big data

PROFIT: A Projected Clustering Technique

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation