Skip to main content
Log in

Entropy and sigmoid based K-means clustering and AGWO for effective big data handling

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

This article has been updated

Abstract

In this article, we have presented the effective handling of big data using adaptive clustering and optimization techniques. Initially, heterogeneous data is collected from multiple sources and then transformed the data into desired network graphs. Then finding patterns in the graphs, the module distributes the data into the right data blocks using Entropy and sigmoid based K-means clustering. Subsequently, an adaptive grey wolf optimization (AGWO) algorithm in Hadoop distributed file system (HDFS) distributes the data blocks into the right machine. This optimized HDFS serves as a data source for services to execute queries and provide a platform to apply graph algorithms efficiently as well as reduce resource usage. Finally, we can handle a broad range of data types, query time, and resource usage. The experimental results of the proposed work provide better results in comparison with the existing methods such as GWO and PSOin terms of the algorithm run Time, loading time, resource usage, Query time, Query execution time and convergence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1:
Algorithm 2:
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study’.

Change history

  • 20 October 2022

    Dr Gaurav Garg's affiliation has been updated. 'Himachal Pradesh' has been inserted after 'Baddi'.

References

  1. Abualigah L, Gandomi AH, Elaziz MA, Al Hamad H, Omari M, Alshinwan M, Khasawneh AM (2021) Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Cluster-ing. Electronics 10(2):101

    Article  Google Scholar 

  2. Acharjya DP, Ahmed K (2016) A survey on big data analytics: challenges, open research issues, and tools. Int J Adv Comput Sci Appl 7(2):511–518

    Google Scholar 

  3. Alzyadat WJ, AlHroob A, Almukahel IH, Atan R (2019) Fuzzy map approach for accruing velocity of big data. Compusoft 8(4):3112–3116

    Google Scholar 

  4. Anagnostopoulos I, Zeadally S, Exposito E (2016) Handling big data: research challenges and future directions. J Supercomput 72(4):1494–1516

    Article  Google Scholar 

  5. Azzedin F, Ghaleb M (2019) Towards an Architecture for Handling Big Data in Oil and Gas Industries: Service-Oriented Approach. (IJACSA) Int J Adv Comput Sci Appl 10(2). https://doi.org/10.14569/IJACSA.2019.0100269

  6. Berahmand K, Mohammadi M, Faroughi A, Mohammadiani RP (2022) A novel method of spectral clustering in attributed networks by constructing parameter-free affinity matrix. Clust Comput 25:869–888

    Article  Google Scholar 

  7. Berahmand K, Haghani S, Rostami M, Lia Y (2022) A new Attributed Graph Clustering by using Label Propagation in Complex Networks. J King Saud Univ Comput Inf Sci 34:1869–1883

    Google Scholar 

  8. Berahmanda K, Nasirib E, Mohammadianic RP, Yuefeng L (2021) Spectral clustering on protein interaction networks via constructing affinity matrix graph embedding. Comput Biol Med J 138:104933

    Article  Google Scholar 

  9. Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media

    MATH  Google Scholar 

  10. Bharill N, Tiwari A (2014) Handling big data with fuzzy based classification approach. In: Advance Trends in Soft Computing. Springer, Cham, pp 219–227

    Chapter  Google Scholar 

  11. Bharill N, Tiwari A, Malviya A (2016) Fuzzy based scalable clustering algorithms for handling big data using apache spark. IEEE Trans Big Data 2(4):339–352

    Article  Google Scholar 

  12. Casado R, Younas M (2015) Emerging trends and technologies in the big data processing. Concurr Comput: Practice and Experience 27(8):2078–2091

    Article  Google Scholar 

  13. Chen CLP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques, and technologies: a survey on big data. Inf Sci 275:314–347

    Article  Google Scholar 

  14. Chi M, Plaza A, Benediktsson JA, Sun Z, Shen J, Zhu Y (2016) Big data for remote sensing: Challenges and opportunities. Proc IEEE 104(11):2207–2219

    Article  Google Scholar 

  15. Chowdhury K, Chaudhuri D, Pal AK (2020) An entropy-based initialization method of K-means clustering on the optimal number of clusters. Neural Comput & Applic 33:6965–6982

    Article  Google Scholar 

  16. Hajeer M, Dasgupta D (2017) Handling big data using a data-aware hdfs and evolutionary clustering technique. IEEE Trans Big Data 5(2):134–147

    Article  Google Scholar 

  17. Havens TC, Bezdek JC, Leckie C, Hall LO, Palaniswami M (2012) Fuzzy c-means algorithms for very large data. IEEE Trans Fuzzy Syst 20(6):1130–1146

  18. Hidri MS, Zoghlami MA, Ayed RB Speeding up the large-scale consensus fuzzy clustering for handling Big Data. Fuzzy Sets Syst 348(2018):50–74

  19. Huang J, Abadi DJ, Ren K (2011) Scalable SPARQL querying of large RDF graphs. Proc VLDB Endowment 4(11):1123–1134

  20. Jin X, Wah BW, Cheng X, Wang Y (2015) Significance and challenges of big data research. Big Data Research 2(2):59–64

    Article  Google Scholar 

  21. Khan N, Yaqoob I, Hashem IAT, Inayat Z, Ali M, Kamaleldin W, Alam M, Shiraz M, Gani A (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:1–18

    Google Scholar 

  22. Ramírez-Gallego S, Fernández A, García S, Chen M, Herrera F (2018) Big data: tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce. Inf Fusion 42:51–61

    Article  Google Scholar 

  23. Rodriguez SIR, de Assis Tenorio de Carvalho F (2021) Fuzzy clustering algorithms with distance metric learning and entropy regularization. arXiv preprint arXiv: 2102.09529

  24. Rohloff K, Schantz RE (2011, June) Clause-iteration with MapReduce to scalably query data graphs in the SHARD graph-store. In: Proceedings of the fourth international workshop on data-intensive distributed computing. pp. 35-44

  25. Shekhar H, Sharma M (n.d.) A Framework for Big Data Analytics as a Scalable Systems. In: Special Conference Issue: National Conference on Cloud Computing and Big Data, IJANA, pp. 72–82

  26. Shukla S, Kukade V, Mujawar S (2015) Big data: concept, handling and challenges: an overview. Int J Comput Appl 114(11):6–9

    Google Scholar 

  27. Singh DK, Patgiri R (2016) Big graph: Tools, techniques, issues, challenges and future directions. In: 6th Int. Conf. on Advances in Computing and Information Technology (ACITY 2016), Chennai, India, pp. 119–128

  28. Yang C, Huang Q, Li Z, Liu K, Fei H (2017) Big data and cloud computing: innovation opportunities and challenges. Int J Digit Earth 10(1):13–53

    Article  Google Scholar 

  29. Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473

    Article  Google Scholar 

  30. Zeng G (2015) Research on privacy protection in big data environment. Int J Eng Res Appl:46–50

  31. Zhen C (2021) Using big data fuzzy K-means clustering and information fusion algorithm in English teaching ability evaluation. Complexity 2021:1–9

    Google Scholar 

  32. Zhu L, Yu FR, Wang Y, Ning B, Tang T (2018) Big data analytics in intelligent transportation systems: a survey. IEEE Trans Intell Transp Syst 20(1):383–398

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramdas Vankdothu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Informed consent

There is no Informed Consent.

Ethical approval

This paper is not Ethical Approval. This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vankdothu, R., Hameed, M.A., Bhukya, R. et al. Entropy and sigmoid based K-means clustering and AGWO for effective big data handling. Multimed Tools Appl 82, 15287–15304 (2023). https://doi.org/10.1007/s11042-022-13929-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13929-2

Keywords

Navigation