Abstract
In this article, we have presented the effective handling of big data using adaptive clustering and optimization techniques. Initially, heterogeneous data is collected from multiple sources and then transformed the data into desired network graphs. Then finding patterns in the graphs, the module distributes the data into the right data blocks using Entropy and sigmoid based K-means clustering. Subsequently, an adaptive grey wolf optimization (AGWO) algorithm in Hadoop distributed file system (HDFS) distributes the data blocks into the right machine. This optimized HDFS serves as a data source for services to execute queries and provide a platform to apply graph algorithms efficiently as well as reduce resource usage. Finally, we can handle a broad range of data types, query time, and resource usage. The experimental results of the proposed work provide better results in comparison with the existing methods such as GWO and PSOin terms of the algorithm run Time, loading time, resource usage, Query time, Query execution time and convergence.
Similar content being viewed by others
Data availability
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study’.
Change history
20 October 2022
Dr Gaurav Garg's affiliation has been updated. 'Himachal Pradesh' has been inserted after 'Baddi'.
References
Abualigah L, Gandomi AH, Elaziz MA, Al Hamad H, Omari M, Alshinwan M, Khasawneh AM (2021) Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Cluster-ing. Electronics 10(2):101
Acharjya DP, Ahmed K (2016) A survey on big data analytics: challenges, open research issues, and tools. Int J Adv Comput Sci Appl 7(2):511–518
Alzyadat WJ, AlHroob A, Almukahel IH, Atan R (2019) Fuzzy map approach for accruing velocity of big data. Compusoft 8(4):3112–3116
Anagnostopoulos I, Zeadally S, Exposito E (2016) Handling big data: research challenges and future directions. J Supercomput 72(4):1494–1516
Azzedin F, Ghaleb M (2019) Towards an Architecture for Handling Big Data in Oil and Gas Industries: Service-Oriented Approach. (IJACSA) Int J Adv Comput Sci Appl 10(2). https://doi.org/10.14569/IJACSA.2019.0100269
Berahmand K, Mohammadi M, Faroughi A, Mohammadiani RP (2022) A novel method of spectral clustering in attributed networks by constructing parameter-free affinity matrix. Clust Comput 25:869–888
Berahmand K, Haghani S, Rostami M, Lia Y (2022) A new Attributed Graph Clustering by using Label Propagation in Complex Networks. J King Saud Univ Comput Inf Sci 34:1869–1883
Berahmanda K, Nasirib E, Mohammadianic RP, Yuefeng L (2021) Spectral clustering on protein interaction networks via constructing affinity matrix graph embedding. Comput Biol Med J 138:104933
Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media
Bharill N, Tiwari A (2014) Handling big data with fuzzy based classification approach. In: Advance Trends in Soft Computing. Springer, Cham, pp 219–227
Bharill N, Tiwari A, Malviya A (2016) Fuzzy based scalable clustering algorithms for handling big data using apache spark. IEEE Trans Big Data 2(4):339–352
Casado R, Younas M (2015) Emerging trends and technologies in the big data processing. Concurr Comput: Practice and Experience 27(8):2078–2091
Chen CLP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques, and technologies: a survey on big data. Inf Sci 275:314–347
Chi M, Plaza A, Benediktsson JA, Sun Z, Shen J, Zhu Y (2016) Big data for remote sensing: Challenges and opportunities. Proc IEEE 104(11):2207–2219
Chowdhury K, Chaudhuri D, Pal AK (2020) An entropy-based initialization method of K-means clustering on the optimal number of clusters. Neural Comput & Applic 33:6965–6982
Hajeer M, Dasgupta D (2017) Handling big data using a data-aware hdfs and evolutionary clustering technique. IEEE Trans Big Data 5(2):134–147
Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M (2012) Fuzzy c-means algorithms for very large data. IEEE Trans Fuzzy Syst 20(6):1130–1146
Hidri MS, Zoghlami MA, Ayed RB Speeding up the large-scale consensus fuzzy clustering for handling Big Data. Fuzzy Sets Syst 348(2018):50–74
Huang J, Abadi DJ, Ren K (2011) Scalable SPARQL querying of large RDF graphs. Proc VLDB Endowment 4(11):1123–1134
Jin X, Wah BW, Cheng X, Wang Y (2015) Significance and challenges of big data research. Big Data Research 2(2):59–64
Khan N, Yaqoob I, Hashem IAT, Inayat Z, Ali M, Kamaleldin W, Alam M, Shiraz M, Gani A (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:1–18
Ramírez-Gallego S, Fernández A, García S, Chen M, Herrera F (2018) Big data: tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce. Inf Fusion 42:51–61
Rodriguez SIR, de Assis Tenorio de Carvalho F (2021) Fuzzy clustering algorithms with distance metric learning and entropy regularization. arXiv preprint arXiv: 2102.09529
Rohloff K, Schantz RE (2011, June) Clause-iteration with MapReduce to scalably query data graphs in the SHARD graph-store. In: Proceedings of the fourth international workshop on data-intensive distributed computing. pp. 35-44
Shekhar H, Sharma M (n.d.) A Framework for Big Data Analytics as a Scalable Systems. In: Special Conference Issue: National Conference on Cloud Computing and Big Data, IJANA, pp. 72–82
Shukla S, Kukade V, Mujawar S (2015) Big data: concept, handling and challenges: an overview. Int J Comput Appl 114(11):6–9
Singh DK, Patgiri R (2016) Big graph: Tools, techniques, issues, challenges and future directions. In: 6th Int. Conf. on Advances in Computing and Information Technology (ACITY 2016), Chennai, India, pp. 119–128
Yang C, Huang Q, Li Z, Liu K, Fei H (2017) Big data and cloud computing: innovation opportunities and challenges. Int J Digit Earth 10(1):13–53
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Zeng G (2015) Research on privacy protection in big data environment. Int J Eng Res Appl:46–50
Zhen C (2021) Using big data fuzzy K-means clustering and information fusion algorithm in English teaching ability evaluation. Complexity 2021:1–9
Zhu L, Yu FR, Wang Y, Ning B, Tang T (2018) Big data analytics in intelligent transportation systems: a survey. IEEE Trans Intell Transp Syst 20(1):383–398
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Informed consent
There is no Informed Consent.
Ethical approval
This paper is not Ethical Approval. This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Vankdothu, R., Hameed, M.A., Bhukya, R. et al. Entropy and sigmoid based K-means clustering and AGWO for effective big data handling. Multimed Tools Appl 82, 15287–15304 (2023). https://doi.org/10.1007/s11042-022-13929-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13929-2