Skip to main content

Clustering Nominal and Numerical Data: A New Distance Concept for a Hybrid Genetic Algorithm

  • Conference paper
Evolutionary Computation in Combinatorial Optimization (EvoCOP 2004)

Abstract

As intrinsic structures, like the number of clusters, is, for real data, a major issue of the clustering problem, we propose, in this paper, CHyGA (Clustering Hybrid Genetic Algorithm) an hybrid genetic algorithm for clustering. CHyGA treats the clustering problem as an optimization problem and searches for an optimal number of clusters characterized by an optimal distribution of instances into the clusters. CHyGA introduces a new representation of solutions and uses dedicated operators, such as one iteration of K-means as a mutation operator. In order to deal with nominal data, we propose a new definition of the cluster center concept and demonstrate its properties. Experimental results on classical benchmarks are given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Back, T., Fogel, D.B., Michalewicz, Z. (eds.): Handbook of Evolutionary Computation. Oxford University Press, Oxford (1997)

    Google Scholar 

  2. Bandyopadhyay, S., Maulik, U.: Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognition 35, 1197–1208 (2002)

    Article  MATH  Google Scholar 

  3. Bezdeck, J.C., Boggavaparu, S., Hall, L.O., Bensaid, A.: Genetic algorithm guided clustering. In: Proc. of the First IEEE Conference on Evolutionary Computation, pp. 34–38 (1994)

    Google Scholar 

  4. Blake, C.L., Merz, C.J.: Uci repository of machine learning databases, University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  5. Bottou, L., Bengio, Y.: Convergence properties of the K-means algorithms. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 585–592. The MIT Press, Cambridge (1995)

    Google Scholar 

  6. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in statistics 3(1), 1–27 (1974)

    Article  MathSciNet  Google Scholar 

  7. Cole, R.M.: Clustering with genetic algorithms. Master’s thesis, University of Western Australia, Australia (1998), http://citeseer.nj.nec.com/cole98clustering.html

  8. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1 (1979)

    Google Scholar 

  9. Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact, well-seperated clusters. Journal of Cybernetics 3(3), 32–57 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  10. Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley, Chichester (1998)

    Google Scholar 

  11. Galinier, P., Hao, J.K.: Hybrid evolutionary algorithms for graph coloring. Journal of Combinatorial Optimization 3, 379–397 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  12. Hall, L.O., Oezyurt, I.B., Bezdek, J.C.: Clustering with a genetically optimized approach. IEEE Transactions on EC 3(2), 103–112 (1999)

    Google Scholar 

  13. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2(3), 283–304 (1998)

    Article  Google Scholar 

  14. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  15. Jardine, N., van Rijsbergen, C.J.: The use of hierarchical clustering in information retrieval. Information Storage and Retrieval 7(5), 217–240 (1971)

    Article  Google Scholar 

  16. Jones, D.R., Beltramo, M.A.: Solving partitioning problems with genetic algorithms. In: Proc. of the Fourth International Conference on Genetic Algorithms, pp. 442–449. Morgan Kaufman Publishers, San Francisco (1991)

    Google Scholar 

  17. Jourdan, L., Dhaenens, C., Talbi, E.G., Gallina, S.: A data mining approach to discover genetic and environmental factors involved in multifactoral diseases. Knowledge Based Systems 15(4), 235–242 (2002)

    Article  Google Scholar 

  18. Kaufman, L., Rousseuw, P.: Finding Groups in Data- An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Sciences (1990)

    Google Scholar 

  19. Liu, G.L.: Introduction to combinatorial Mathematics. McGraw-Hill, New York (1968)

    MATH  Google Scholar 

  20. Michalewizc, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer, Heidelberg (1996) (third, revised and extend edition)

    Google Scholar 

  21. Ruspini, E.H.: Numerical methods for fuzzy clustering. Inform. Sci. 2, 319–350 (1970)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vermeulen-Jourdan, L., Dhaenens, C., Talbi, EG. (2004). Clustering Nominal and Numerical Data: A New Distance Concept for a Hybrid Genetic Algorithm. In: Gottlieb, J., Raidl, G.R. (eds) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2004. Lecture Notes in Computer Science, vol 3004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24652-7_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24652-7_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21367-3

  • Online ISBN: 978-3-540-24652-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics