Fuzzy rough clustering for categorical data

Xu, Shuliang; Liu, Shenglan; Zhou, Jian; Feng, Lin

doi:10.1007/s13042-019-01012-6

Fuzzy rough clustering for categorical data

Original Article
Published: 19 September 2019

Volume 10, pages 3213–3223, (2019)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Shuliang Xu ORCID: orcid.org/0000-0002-5464-2354¹,
Shenglan Liu²,
Jian Zhou¹ &
…
Lin Feng²

411 Accesses
8 Citations
Explore all metrics

Abstract

Unlabeled categorical data is common in many applications. Because there is no geometric structure for categorical data, how to discover knowledge and patterns from unlabeled categorical data is an important problem. In this paper, a fuzzy rough clustering algorithm for categorical data is proposed. The proposed algorithm uses the partition of each attribute to calculate the granularity of each attribute and introduces information granularity to measure the significance of each attribute. It is different from traditional clustering algorithms for categorical data that the proposed algorithm can transform categorical data set into numeric data set and introduces a nonlinear dimension reduction algorithm to decrease the dimensions of data set. The proposed algorithm and the comparison algorithms are executed on real data sets. The experimental results show that the proposed algorithm outperforms the comparison algorithms on the most data sets and the results prove that the proposed algorithm is an effective clustering algorithm for categorical data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory

Clustering Based on Classification Quality (CCQ)

Attribute reduction in an incomplete categorical decision information system based on fuzzy rough sets

Article 18 January 2022

Notes

http://archive.ics.uci.edu/ml/index.php

References

An S, Hu QH, Yu DR (2015) Robust rough sets and applications. Tsinghua University Press, Tsinghua
Google Scholar
Andritsos P, Tsaparas P, Miller RJ, Sevcik KC (2004) Limbo: scalable clustering of categorical data. In: International conference on extending database technology. Springer, pp. 123–146
Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. Knowl Based Syst 26:120–127
Article Google Scholar
Cao F, Liang J, Li D, Zhao X (2013) A weighting k-modes algorithm for subspace clustering of categorical data. Neurocomputing 108:23–30
Article Google Scholar
Chaturvedi A, Green PE, Caroll JD (2001) K-modes clustering. J Class 18(1):35–55
Article MathSciNet Google Scholar
Chen K, Liu L (2005) The“ best k” for entropy-based categorical data clustering. In: international conference on scientific and statistical database management, pp 253–262
Correa ES, Freitas AA, Johnson CG (2006) A new discrete particle swarm algorithm applied to attribute selection in a bioinformatics data set. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation. ACM, pp 35–42
Fan J, Niu Z, Liang Y, Zhao Z (2016) Probability model selection and parameter evolutionary estimation for clustering imbalanced data without sampling. Neurocomputing 211:172–181
Article Google Scholar
Fan JC, Li Y, Tang LY, Wu GK (2018) Roughpso: rough set-based particle swarm optimisation. Int J Bio-Inspir Comput 12(4):245–253
Article Google Scholar
Feng L, Xu S, Wang F, Liu S, Qiao H (2019) Rough extreme learning machine: a new classification method based on uncertainty measure. Neurocomputing 325:269–282
Article Google Scholar
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 186–193
Fu L, Niu B, Zhu Z, Wu S, Li W (2012) Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152
Article Google Scholar
Gao C, Pedrycz W, Miao D (2013) Rough subspace-based clustering ensemble for categorical data. Soft Comput 17(9):1643–1658
Article Google Scholar
Gong Z, Zhang X (2017) The further investigation of variable precision intuitionistic fuzzy rough set model. Int J Mach Learn Cybern 8(5):1565–1584
Article Google Scholar
Guha S, Rastogi R, Shim K (2000) Rock: A robust clustering algorithm for categorical attributes. Information systems 25(5):345–366
Article Google Scholar
He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems, pp 153–160
Hu X, Tang J, Gao H, Liu H (2013) Unsupervised sentiment analysis with emotional signals. In: Proceedings of the 22nd international conference on World Wide Web. ACM, pp 607–618
Kim M, Kim I, Lee M, Jang B (2018) Worldwide emerging disease-related information extraction system from news data. In: Proceedings of the 16th ACM conference on embedded networked sensor systems. ACM, pp 331–332
Li C, Zhu L, Luo Z (2018) Underdetermined blind separation via rough equivalence clustering for satellite communications. In: 2018 international symposium on networks, computers and communications (ISNCC). IEEE, pp 1–5
Li W, Jia X, Wang L, Zhou B (2019) Multi-objective attribute reduction in three-way decision-theoretic rough set model. Int J Approx Reason 105:327–341
Article MathSciNet Google Scholar
Li Y, Li D, Wang S, Zhai Y (2014) Incremental entropy-based clustering on categorical data streams with concept drift. Knowl Based Syst 59:33–47
Article Google Scholar
Lin T, Zha H (2008) Riemannian manifold learning. IEEE Trans Pattern Anal Mach Intell 30(5):796–809
Article Google Scholar
Nath B, Bhattacharyya D, Ghosh A (2013) Incremental association rule mining: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 3(3):157–169
Article Google Scholar
Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 3:503–507
Article Google Scholar
Parmar D, Wu T, Blackhurst J (2007) Mmr: an algorithm for clustering categorical data using rough set theory. Data Knowl Eng 63(3):879–893
Article Google Scholar
Rekik R, Kallel I, Casillas J, Alimi AM (2018) Assessing web sites quality: a systematic literature review by text and association rules mining. Int J Inf Manag 38(1):201–216
Article Google Scholar
Song L, Tekin C, van der Schaar M (2016) Online learning in large-scale contextual recommender systems. IEEE Trans Serv Comput 9(3):433–445
Article Google Scholar
Steinbach M, Karypis G, Kumar V et al (2000) A comparison of document clustering techniques. In: KDD workshop on text mining, vol 400. Boston, pp. 525–526
Tiwari AK, Shreevastava S, Som T, Shukla KK (2018) Tolerance-based intuitionistic fuzzy-rough set approach for attribute reduction. Expert Syst Appl 101:205–212
Article Google Scholar
Wang R, Wang XZ, Kwong S, Xu C (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475
Article Google Scholar
Wang XZ, Wang R, Xu C (2017) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715
Article Google Scholar
Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2014) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654
Article Google Scholar
Wang XZ, Zhang T, Wang R (2019) Noniterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. IEEE Trans Syst Man Cybern Syst 49(7):1299–1380
Article Google Scholar
Xie J (2016) Unsupervised learning methods and applications. Publishing Hourse of Electronics Industry, Beijing
Google Scholar
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar
Yang Q, Du Pa, Wang Y, Liang B (2018) Developing a rough set based approach for group decision making based on determining weights of decision makers with interval numbers. Oper Res 18(3):757–779
Google Scholar
Yao Y (2007) Decision-theoretic rough set models. In: International conference on rough sets and knowledge technology. Springer, pp 1–12

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (Nos.2017YFB1300200, 2017YFB1300203), National Natural Science Fund of China (Nos.61972064, 61672130, 61602082, 61627808, 91648205), the Open Program of State Key Laboratory of Software Architecture (No.SKLSAOP1701), LiaoNing Revitalization Talents Program (No. XLYC1806006), the Fundamental Research Funds for the Central Universities (Nos. DUT19RC(3)012, DUT17RC(3)071) and the development of science and technology of Guangdong province special fund project (No.2016B090910001). The authors are grateful to the editor and the anonymous reviewers for constructive comments that helped to improve the quality and presentation of this paper.

Author information

Authors and Affiliations

Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
Shuliang Xu & Jian Zhou
School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, China
Shenglan Liu & Lin Feng

Authors

Shuliang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shenglan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Lin Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Feng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, S., Liu, S., Zhou, J. et al. Fuzzy rough clustering for categorical data. Int. J. Mach. Learn. & Cyber. 10, 3213–3223 (2019). https://doi.org/10.1007/s13042-019-01012-6

Download citation

Received: 27 April 2019
Accepted: 04 September 2019
Published: 19 September 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s13042-019-01012-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fuzzy rough clustering for categorical data

Abstract

Access this article

Similar content being viewed by others

Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory

Clustering Based on Classification Quality (CCQ)

Attribute reduction in an incomplete categorical decision information system based on fuzzy rough sets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fuzzy rough clustering for categorical data

Abstract

Access this article

Similar content being viewed by others

Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory

Clustering Based on Classification Quality (CCQ)

Attribute reduction in an incomplete categorical decision information system based on fuzzy rough sets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation