Abstract
As intrinsic structures, like the number of clusters, is, for real data, a major issue of the clustering problem, we propose, in this paper, CHyGA (Clustering Hybrid Genetic Algorithm) an hybrid genetic algorithm for clustering. CHyGA treats the clustering problem as an optimization problem and searches for an optimal number of clusters characterized by an optimal distribution of instances into the clusters. CHyGA introduces a new representation of solutions and uses dedicated operators, such as one iteration of K-means as a mutation operator. In order to deal with nominal data, we propose a new definition of the cluster center concept and demonstrate its properties. Experimental results on classical benchmarks are given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Back, T., Fogel, D.B., Michalewicz, Z. (eds.): Handbook of Evolutionary Computation. Oxford University Press, Oxford (1997)
Bandyopadhyay, S., Maulik, U.: Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognition 35, 1197–1208 (2002)
Bezdeck, J.C., Boggavaparu, S., Hall, L.O., Bensaid, A.: Genetic algorithm guided clustering. In: Proc. of the First IEEE Conference on Evolutionary Computation, pp. 34–38 (1994)
Blake, C.L., Merz, C.J.: Uci repository of machine learning databases, University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bottou, L., Bengio, Y.: Convergence properties of the K-means algorithms. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 585–592. The MIT Press, Cambridge (1995)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in statistics 3(1), 1–27 (1974)
Cole, R.M.: Clustering with genetic algorithms. Master’s thesis, University of Western Australia, Australia (1998), http://citeseer.nj.nec.com/cole98clustering.html
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1 (1979)
Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact, well-seperated clusters. Journal of Cybernetics 3(3), 32–57 (1973)
Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley, Chichester (1998)
Galinier, P., Hao, J.K.: Hybrid evolutionary algorithms for graph coloring. Journal of Combinatorial Optimization 3, 379–397 (1999)
Hall, L.O., Oezyurt, I.B., Bezdek, J.C.: Clustering with a genetically optimized approach. IEEE Transactions on EC 3(2), 103–112 (1999)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
Jardine, N., van Rijsbergen, C.J.: The use of hierarchical clustering in information retrieval. Information Storage and Retrieval 7(5), 217–240 (1971)
Jones, D.R., Beltramo, M.A.: Solving partitioning problems with genetic algorithms. In: Proc. of the Fourth International Conference on Genetic Algorithms, pp. 442–449. Morgan Kaufman Publishers, San Francisco (1991)
Jourdan, L., Dhaenens, C., Talbi, E.G., Gallina, S.: A data mining approach to discover genetic and environmental factors involved in multifactoral diseases. Knowledge Based Systems 15(4), 235–242 (2002)
Kaufman, L., Rousseuw, P.: Finding Groups in Data- An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Sciences (1990)
Liu, G.L.: Introduction to combinatorial Mathematics. McGraw-Hill, New York (1968)
Michalewizc, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer, Heidelberg (1996) (third, revised and extend edition)
Ruspini, E.H.: Numerical methods for fuzzy clustering. Inform. Sci. 2, 319–350 (1970)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vermeulen-Jourdan, L., Dhaenens, C., Talbi, EG. (2004). Clustering Nominal and Numerical Data: A New Distance Concept for a Hybrid Genetic Algorithm. In: Gottlieb, J., Raidl, G.R. (eds) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2004. Lecture Notes in Computer Science, vol 3004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24652-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-24652-7_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21367-3
Online ISBN: 978-3-540-24652-7
eBook Packages: Springer Book Archive