Skip to main content
Log in

Fuzzy rough clustering for categorical data

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Unlabeled categorical data is common in many applications. Because there is no geometric structure for categorical data, how to discover knowledge and patterns from unlabeled categorical data is an important problem. In this paper, a fuzzy rough clustering algorithm for categorical data is proposed. The proposed algorithm uses the partition of each attribute to calculate the granularity of each attribute and introduces information granularity to measure the significance of each attribute. It is different from traditional clustering algorithms for categorical data that the proposed algorithm can transform categorical data set into numeric data set and introduces a nonlinear dimension reduction algorithm to decrease the dimensions of data set. The proposed algorithm and the comparison algorithms are executed on real data sets. The experimental results show that the proposed algorithm outperforms the comparison algorithms on the most data sets and the results prove that the proposed algorithm is an effective clustering algorithm for categorical data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/index.php

References

  1. An S, Hu QH, Yu DR (2015) Robust rough sets and applications. Tsinghua University Press, Tsinghua

    Google Scholar 

  2. Andritsos P, Tsaparas P, Miller RJ, Sevcik KC (2004) Limbo: scalable clustering of categorical data. In: International conference on extending database technology. Springer, pp. 123–146

  3. Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. Knowl Based Syst 26:120–127

    Article  Google Scholar 

  4. Cao F, Liang J, Li D, Zhao X (2013) A weighting k-modes algorithm for subspace clustering of categorical data. Neurocomputing 108:23–30

    Article  Google Scholar 

  5. Chaturvedi A, Green PE, Caroll JD (2001) K-modes clustering. J Class 18(1):35–55

    Article  MathSciNet  Google Scholar 

  6. Chen K, Liu L (2005) The“ best k” for entropy-based categorical data clustering. In: international conference on scientific and statistical database management, pp 253–262

  7. Correa ES, Freitas AA, Johnson CG (2006) A new discrete particle swarm algorithm applied to attribute selection in a bioinformatics data set. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation. ACM, pp 35–42

  8. Fan J, Niu Z, Liang Y, Zhao Z (2016) Probability model selection and parameter evolutionary estimation for clustering imbalanced data without sampling. Neurocomputing 211:172–181

    Article  Google Scholar 

  9. Fan JC, Li Y, Tang LY, Wu GK (2018) Roughpso: rough set-based particle swarm optimisation. Int J Bio-Inspir Comput 12(4):245–253

    Article  Google Scholar 

  10. Feng L, Xu S, Wang F, Liu S, Qiao H (2019) Rough extreme learning machine: a new classification method based on uncertainty measure. Neurocomputing 325:269–282

    Article  Google Scholar 

  11. Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 186–193

  12. Fu L, Niu B, Zhu Z, Wu S, Li W (2012) Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152

    Article  Google Scholar 

  13. Gao C, Pedrycz W, Miao D (2013) Rough subspace-based clustering ensemble for categorical data. Soft Comput 17(9):1643–1658

    Article  Google Scholar 

  14. Gong Z, Zhang X (2017) The further investigation of variable precision intuitionistic fuzzy rough set model. Int J Mach Learn Cybern 8(5):1565–1584

    Article  Google Scholar 

  15. Guha S, Rastogi R, Shim K (2000) Rock: A robust clustering algorithm for categorical attributes. Information systems 25(5):345–366

    Article  Google Scholar 

  16. He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems, pp 153–160

  17. Hu X, Tang J, Gao H, Liu H (2013) Unsupervised sentiment analysis with emotional signals. In: Proceedings of the 22nd international conference on World Wide Web. ACM, pp 607–618

  18. Kim M, Kim I, Lee M, Jang B (2018) Worldwide emerging disease-related information extraction system from news data. In: Proceedings of the 16th ACM conference on embedded networked sensor systems. ACM, pp 331–332

  19. Li C, Zhu L, Luo Z (2018) Underdetermined blind separation via rough equivalence clustering for satellite communications. In: 2018 international symposium on networks, computers and communications (ISNCC). IEEE, pp 1–5

  20. Li W, Jia X, Wang L, Zhou B (2019) Multi-objective attribute reduction in three-way decision-theoretic rough set model. Int J Approx Reason 105:327–341

    Article  MathSciNet  Google Scholar 

  21. Li Y, Li D, Wang S, Zhai Y (2014) Incremental entropy-based clustering on categorical data streams with concept drift. Knowl Based Syst 59:33–47

    Article  Google Scholar 

  22. Lin T, Zha H (2008) Riemannian manifold learning. IEEE Trans Pattern Anal Mach Intell 30(5):796–809

    Article  Google Scholar 

  23. Nath B, Bhattacharyya D, Ghosh A (2013) Incremental association rule mining: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 3(3):157–169

    Article  Google Scholar 

  24. Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 3:503–507

    Article  Google Scholar 

  25. Parmar D, Wu T, Blackhurst J (2007) Mmr: an algorithm for clustering categorical data using rough set theory. Data Knowl Eng 63(3):879–893

    Article  Google Scholar 

  26. Rekik R, Kallel I, Casillas J, Alimi AM (2018) Assessing web sites quality: a systematic literature review by text and association rules mining. Int J Inf Manag 38(1):201–216

    Article  Google Scholar 

  27. Song L, Tekin C, van der Schaar M (2016) Online learning in large-scale contextual recommender systems. IEEE Trans Serv Comput 9(3):433–445

    Article  Google Scholar 

  28. Steinbach M, Karypis G, Kumar V et al (2000) A comparison of document clustering techniques. In: KDD workshop on text mining, vol 400. Boston, pp. 525–526

  29. Tiwari AK, Shreevastava S, Som T, Shukla KK (2018) Tolerance-based intuitionistic fuzzy-rough set approach for attribute reduction. Expert Syst Appl 101:205–212

    Article  Google Scholar 

  30. Wang R, Wang XZ, Kwong S, Xu C (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475

    Article  Google Scholar 

  31. Wang XZ, Wang R, Xu C (2017) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715

    Article  Google Scholar 

  32. Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2014) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654

    Article  Google Scholar 

  33. Wang XZ, Zhang T, Wang R (2019) Noniterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. IEEE Trans Syst Man Cybern Syst 49(7):1299–1380

    Article  Google Scholar 

  34. Xie J (2016) Unsupervised learning methods and applications. Publishing Hourse of Electronics Industry, Beijing

    Google Scholar 

  35. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

    Article  Google Scholar 

  36. Yang Q, Du Pa, Wang Y, Liang B (2018) Developing a rough set based approach for group decision making based on determining weights of decision makers with interval numbers. Oper Res 18(3):757–779

    Google Scholar 

  37. Yao Y (2007) Decision-theoretic rough set models. In: International conference on rough sets and knowledge technology. Springer, pp 1–12

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (Nos.2017YFB1300200, 2017YFB1300203), National Natural Science Fund of China (Nos.61972064, 61672130, 61602082, 61627808, 91648205), the Open Program of State Key Laboratory of Software Architecture (No.SKLSAOP1701), LiaoNing Revitalization Talents Program (No. XLYC1806006), the Fundamental Research Funds for the Central Universities (Nos. DUT19RC(3)012, DUT17RC(3)071) and the development of science and technology of Guangdong province special fund project (No.2016B090910001). The authors are grateful to the editor and the anonymous reviewers for constructive comments that helped to improve the quality and presentation of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Feng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, S., Liu, S., Zhou, J. et al. Fuzzy rough clustering for categorical data. Int. J. Mach. Learn. & Cyber. 10, 3213–3223 (2019). https://doi.org/10.1007/s13042-019-01012-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-019-01012-6

Keywords

Navigation