Abstract
A measurement of cluster quality is often needed for DNA microarray data analysis. In this paper, we introduce a new cluster validity index, which measures geometrical features of the data. The essential concept of this index is to evaluate the ratio between the squared total length of the data eigen-axes with respect to the between-cluster separation. We show that this cluster validity index works well for data that contain clusters closely distributed or with different sizes. We verify the method using three simulated data sets, two real world data sets and two microarray data sets. The experiment results show that the proposed index is superior to five other cluster validity indices, including partition coefficients (PC), General silhouette index (GS), Dunn’s index (DI), CH Index and I-Index. Also, we have given a theorem to show for what situations the proposed index works well.
Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Control 19:716–723
Bezdek J (1974) Mathematical taxonomy with fuzzy sets. J Math Biol, 1:57–71
Cho R, Campbell M, Winzeler E, Steinmetz L, Conway A, Wodicka L, Wolfsberg T, Gabrielian A, Landsman D, Lockhart D, Davis R (1998) Genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell 2:65–78
Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown P, Herskowitz I (1998) The transcriptional program of sporulation in budding yeast. Science 282:699–705
Dubes R, Jain A (1979) Validity studies in clustering methodologies. Pattern Recognit 11:235–254
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Johnson R, Wichern D (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall, Upper Saddle River, NJ
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis.Wiley, New York
Lam B, Yan H (2005) A new cluster validity index for data with merged clusters and different densities. In: IEEE international. conf. on systems, man and cybernetics (to appear)
Lam B, Yan H (2005) Cluster validity for DNA microarray data using a geometrical index. In: Proceedings of the. International. Conference. Machine learning and cybernetics, pp 3333–3339
Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell, 24(12): 1650–1654
Milligan G, Cooper C (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50:159–179
Qi Y, Xu S (2004) Supervised cluster analysis for microarray data based on multivariate Gaussian mixture. Bioinformatics 20(12): 1905–1913
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math, 20: 53–65
Schwartz G (1978) Estimating the dimension of a model. Ann Stati 6: 461–464
Tavazoie S, Hughes J, Campbell M, Cho R, Church G (1999) Systematic determination of genetic network architecture. Nat Genet 22: 218–285
Yeung K, Fraley C, Murua A, Raftery A, Ruzzo W (2001) Mode-based clustering and data transformations for gene expression data. Bioinformatics 17:977–987
Yeung K, Haynor D, Ruzzo W (2001) Validating clustering for gene expression data. Bioinformatics 17(4):309–318
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lam, B.S.Y., Yan, H. Assessment of Microarray Data Clustering Results Based on a New Geometrical Index for Cluster Validity. Soft Comput 11, 341–348 (2007). https://doi.org/10.1007/s00500-006-0087-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-006-0087-1