Reduct and Variance Based Clustering of High Dimensional Dataset

Rajput, Dharmveer Singh; Singh, P. K.; Bhattacharya, M.

doi:10.1007/978-3-642-27872-3_11

Dharmveer Singh Rajput¹⁸,
P. K. Singh¹⁸ &
M. Bhattacharya¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6411))

Included in the following conference series:

International Conference on Data Engineering and Management

1384 Accesses

Abstract

In high dimensional data, general performance of the traditional clustering algorithms decreases. As some dimensions are likely to be irrelevant or contain noisy data and randomly selected initial centre of the clusters converge the clustering to local minima. In this paper, we propose a framework for clustering high dimensional data with attribute subset selection and efficient cluster centre initialization. It uses rough set theory to determine the relevant attributes (dimensions) in first phase. In second phase, maximum variance dimension is used to determine the optimal initial centres of the clusters. The k-means clustering algorithm is applied with these initial cluster centres, in phase three, to find optimal clustering of data set. It improves efficiency of the clustering process tremendously and our experiment on test data set shows that accuracy of the results has improved considerably.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys (CSUR) 31(3), 264–323 (1999)
Article Google Scholar
Celebi, M.E.: Effective Initialization of k-means for color quantization. In: IEEE International Conference on Image Processing (ICIP) (2009)
Google Scholar
Niu, K., Zhang, S., Chen, J.: Subspace clustering through attribute clustering. Front. Electr. Electron. Eng. China 3(1), 44–48 (2008)
Article Google Scholar
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. In: International Conference on Management of Data, pp. 70–81 (2000)
Google Scholar
Barakbah, A.R., Kiyoki, Y.: A pillar algorithm for k-means optimization by distance maximization for initial Centroid designation. IEEE (2009)
Google Scholar
Arai, K., Barakbah, A. R.: Hierarchical K-means: an algorithm for centroids initialization for K-means. Reports of the Faculty of Science and Engineering 36(1) (2007)
Google Scholar
Parsons, L., Haque, E., Liu, H.: Subspace Clustering for High Dimensional Data: A Review. Supported in Part by Grants from Prop 301 and CEINT (2004)
Google Scholar
Jain, A.K.: Data Clustering: 50 Years Beyond K-Means. To Appear in Pattern Recognition Letters (2009)
Google Scholar
Skowron, A., Pawlak, Z., Komorowski, J., Polkowski, L.: A Rough set perspective on data and knowledge. In: Handbook of Data Mining and Knowledge Discovery, pp. 134–149. Oxford University Press (2002)
Google Scholar
http://www.uni-koeln.de/themen/statistik/data/cluster/milk.dat
Jahirabadkar, S., Kulkarni, P.: ISC – Intelligent Subspace Clustering, A Density based Clustering approach for High Dimensional Dataset. World Academy of Science, Engineering and Technology 55 (2009)
Google Scholar
Kriegel, H.P., Krger, P., Renz, M., Wurst, S.: A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data. In: Proc. 5th IEEE International Conference of Data Mining (ICDM), Houston, TX (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

ABV – Indian Institute of Information Technology and Management, Morena Link Road, Gwalior, 474010, Madhya Pradesh, India
Dharmveer Singh Rajput, P. K. Singh & M. Bhattacharya

Authors

Dharmveer Singh Rajput
View author publications
You can also search for this author in PubMed Google Scholar
P. K. Singh
View author publications
You can also search for this author in PubMed Google Scholar
M. Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Bishop Heber College(Autonomous), 620017, Tiruchirappalli, India
Rajkumar Kannan
National Institute of Informatics (NII), 101-8430, Tokyo, Japan
Frederic Andres

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rajput, D.S., Singh, P.K., Bhattacharya, M. (2012). Reduct and Variance Based Clustering of High Dimensional Dataset. In: Kannan, R., Andres, F. (eds) Data Engineering and Management. ICDEM 2010. Lecture Notes in Computer Science, vol 6411. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27872-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-27872-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27871-6
Online ISBN: 978-3-642-27872-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics