Skip to main content

Enhancement of Data Streaming in Clustering for Uncertain Data

  • Conference paper
  • First Online:
Proceedings of the International Conference on Intelligent Systems and Signal Processing

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 671))

  • 708 Accesses

Abstract

Data stream mining presents exciting model from the stream of information. The objective of clustering data stream is to cluster and determine unknown pattern from streaming data. Streaming data generated from imprecise hardware result into uncertainty built into them. In this paper, we propose a novel approach for data stream clustering which reduces the degree of uncertainty and increases the degree of homogeneity. Unlike current procedures whose aim is to cluster data stream, we consider the case where uncertainty of micro-cluster affects the quality of clustering outcome. The proposed algorithm Fuzzy-HCMStream is based on HCMStream algorithm which incorporates the concept of Fuzzy C-Mean algorithm, in account for uncertainty. Our proposed algorithm increases degree of homogeneity, which leads to improvement in clustering quality. The implementation of proposed algorithm is tested with two data sets comprising synthetic and real data set. The Fuzzy-HCMStream algorithm shows improvement in performance compared to the other state-of-the-art algorithms in terms of various indices for evaluating clustering results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Han and Kamber, Book Review: “Data mining – Concepts and Techniques”.

    Google Scholar 

  2. Mohamed Medhat Gaber, Arkady Zaslavsky and Shonali Krishnaswamy, “Mining Data Streams: A Review”, SIGMOD Record, Volume 34, No. 2, June 2005. doi:10.1145/1083784.1083789.

  3. Charu C. Aggarwal, “A Survey of Uncertain Data Clustering Algorithms”.

    Google Scholar 

  4. S. Guha, A. Meyerson, N. Mishra, R. Motwani, L. O’Callaghan, Clustering data streams: theory and practice, IEEE Trans. Knowl. Data Eng. 15 (2003) 515–528. doi:10.1109/TKDE.2003.1198387.

  5. F. Cao, M. Ester, W. Qian, A. Zhou, Density-based clustering over an evolving data stream with noise, in: Proceedings of SIAM International Conference on Data Mining, 2004, pp. 328–339.

    Google Scholar 

  6. Sirisup Laohakiat, Suphakant Phimoltares, Chidchanok Lursinsap, “Hyper-cylindrical micro-clustering for streaming data with unscheduled data removals”, 10.1016/j.knosys.2016.02.004, Elsevier, 2016.

  7. Fuzzy Sets and Fuzzy Logic Theory and Applications by George J. Klir and Boyuan.

    Google Scholar 

  8. Jonathan A Silva, Elaine R Faria, Rodrigo C Barros, et al., “Data stream clustering, A survey,” ACM Computing Surveys, 2013, 46(1). doi:https://doi.org/10.1145/2522968.2522981.

  9. Ngai Wang Kay, Ben Kao, Chun Kit Chui et al., “Efficient clustering of uncertain data,” in Proc. of 6th International Conference on Data Mining, Hong Kong, China, 2006, pp. 436–445.

    Google Scholar 

  10. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.381.2407&rep=rep1&type=pdf.

  11. P. Hore, L. O. Hall, and D. B. Goldgof, “A Fuzzy C Means Variant For Clustering Evolving Data Streams,” IEEE International Conference on Systems, Man and Cybernetics, Montreal, pages 360–365, Oct. 2007.

    Google Scholar 

  12. P. Hore, L. O. Hall, D. B. Goldgof, W. Cheng, “Online Fuzzy C Means”, 978-1-4244-2352-1/08/$25.00 ©2008 IEEE, 2008.

    Google Scholar 

  13. Baoju Zhang, Shan Qin, Wei Wang, Dan Wang, Lei Xue, “Data stream clustering based on Fuzzy C-Mean algorithm and entropy theory”, 10.1016/j.sigpro.2015.10.014, 2015.

  14. Abir Smiti, Zied Eloudi, “Soft DBSCAN: Improving DBSCAN Clustering method using fuzzy set theory”, IEEE 2013. https://doi.org/10.1109/HSI.2013.6577851.

  15. Yue Yang, Zhuo Liu, Zhidan Xing, “A Review of Uncertain Data Stream Clustering Algorithms”, 2015. https://doi.org/10.1109/ICICSE.2015.30.

  16. Gloria Bordogna, Dino Ienco, Irstea, Montpellier, “Fuzzy Core DBScan Clustering Algorithm”, Springer, 2014.

    Google Scholar 

  17. Yixin Chen, Li Tu, “Density-Based Clustering for Real-Time Stream Data”, 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007. https://doi.org/10.1145/1281192.1281210.

  18. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/, 1999.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeny Ganatra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ganatra, J., Thacker, C. (2018). Enhancement of Data Streaming in Clustering for Uncertain Data. In: Kher, R., Gondaliya, D., Bhesaniya, M., Ladid, L., Atiquzzaman, M. (eds) Proceedings of the International Conference on Intelligent Systems and Signal Processing . Advances in Intelligent Systems and Computing, vol 671. Springer, Singapore. https://doi.org/10.1007/978-981-10-6977-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6977-2_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6976-5

  • Online ISBN: 978-981-10-6977-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics