Clustering Nominal and Numerical Data: A New Distance Concept for a Hybrid Genetic Algorithm

Vermeulen-Jourdan, Laetitia; Dhaenens, Clarisse; Talbi, El-Ghazali

doi:10.1007/978-3-540-24652-7_22

Laetitia Vermeulen-Jourdan¹⁴,
Clarisse Dhaenens¹⁴ &
El-Ghazali Talbi¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3004))

Included in the following conference series:

European Conference on Evolutionary Computation in Combinatorial Optimization

716 Accesses
11 Citations

Abstract

As intrinsic structures, like the number of clusters, is, for real data, a major issue of the clustering problem, we propose, in this paper, CHyGA (Clustering Hybrid Genetic Algorithm) an hybrid genetic algorithm for clustering. CHyGA treats the clustering problem as an optimization problem and searches for an optimal number of clusters characterized by an optimal distribution of instances into the clusters. CHyGA introduces a new representation of solutions and uses dedicated operators, such as one iteration of K-means as a mutation operator. In order to deal with nominal data, we propose a new definition of the cluster center concept and demonstrate its properties. Experimental results on classical benchmarks are given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Back, T., Fogel, D.B., Michalewicz, Z. (eds.): Handbook of Evolutionary Computation. Oxford University Press, Oxford (1997)
Google Scholar
Bandyopadhyay, S., Maulik, U.: Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognition 35, 1197–1208 (2002)
Article MATH Google Scholar
Bezdeck, J.C., Boggavaparu, S., Hall, L.O., Bensaid, A.: Genetic algorithm guided clustering. In: Proc. of the First IEEE Conference on Evolutionary Computation, pp. 34–38 (1994)
Google Scholar
Blake, C.L., Merz, C.J.: Uci repository of machine learning databases, University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Bottou, L., Bengio, Y.: Convergence properties of the K-means algorithms. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 585–592. The MIT Press, Cambridge (1995)
Google Scholar
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in statistics 3(1), 1–27 (1974)
Article MathSciNet Google Scholar
Cole, R.M.: Clustering with genetic algorithms. Master’s thesis, University of Western Australia, Australia (1998), http://citeseer.nj.nec.com/cole98clustering.html
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1 (1979)
Google Scholar
Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact, well-seperated clusters. Journal of Cybernetics 3(3), 32–57 (1973)
Article MATH MathSciNet Google Scholar
Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley, Chichester (1998)
Google Scholar
Galinier, P., Hao, J.K.: Hybrid evolutionary algorithms for graph coloring. Journal of Combinatorial Optimization 3, 379–397 (1999)
Article MATH MathSciNet Google Scholar
Hall, L.O., Oezyurt, I.B., Bezdek, J.C.: Clustering with a genetically optimized approach. IEEE Transactions on EC 3(2), 103–112 (1999)
Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)
Article Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)
Article Google Scholar
Jardine, N., van Rijsbergen, C.J.: The use of hierarchical clustering in information retrieval. Information Storage and Retrieval 7(5), 217–240 (1971)
Article Google Scholar
Jones, D.R., Beltramo, M.A.: Solving partitioning problems with genetic algorithms. In: Proc. of the Fourth International Conference on Genetic Algorithms, pp. 442–449. Morgan Kaufman Publishers, San Francisco (1991)
Google Scholar
Jourdan, L., Dhaenens, C., Talbi, E.G., Gallina, S.: A data mining approach to discover genetic and environmental factors involved in multifactoral diseases. Knowledge Based Systems 15(4), 235–242 (2002)
Article Google Scholar
Kaufman, L., Rousseuw, P.: Finding Groups in Data- An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Sciences (1990)
Google Scholar
Liu, G.L.: Introduction to combinatorial Mathematics. McGraw-Hill, New York (1968)
MATH Google Scholar
Michalewizc, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer, Heidelberg (1996) (third, revised and extend edition)
Google Scholar
Ruspini, E.H.: Numerical methods for fuzzy clustering. Inform. Sci. 2, 319–350 (1970)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Bât M3-Cité Scientifique, LIFL-Université de Lille1, 59655, Villeneuve d’Ascq Cedex, France
Laetitia Vermeulen-Jourdan, Clarisse Dhaenens & El-Ghazali Talbi

Authors

Laetitia Vermeulen-Jourdan
View author publications
You can also search for this author in PubMed Google Scholar
Clarisse Dhaenens
View author publications
You can also search for this author in PubMed Google Scholar
El-Ghazali Talbi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

SAP AG, Neurottstr. 16, 69190, Walldorf, Germany
Jens Gottlieb
Institute of Computer Graphics and Algorithms, Vienna University of Technology, Favoritenstraße 9–11/1861, 1040, Vienna, Austria
Günther R. Raidl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vermeulen-Jourdan, L., Dhaenens, C., Talbi, EG. (2004). Clustering Nominal and Numerical Data: A New Distance Concept for a Hybrid Genetic Algorithm. In: Gottlieb, J., Raidl, G.R. (eds) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2004. Lecture Notes in Computer Science, vol 3004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24652-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-24652-7_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21367-3
Online ISBN: 978-3-540-24652-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Clustering Nominal and Numerical Data: A New Distance Concept for a Hybrid Genetic Algorithm