Skip to main content

Evaluation of Categorical Data Clustering

  • Conference paper
Advances in Intelligent Web Mastering – 3

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 86))

Abstract

Methods of cluster analysis are well known techniques of multivariate analysis used for many years. Their main applications concern clustering objects characterized by quantitative variables. For this case various coefficients for clustering evaluation and determination of cluster numbers have been proposed. However, in some areas, i.e., for segmentation of Internet users, the variables are often nominal or ordinal as their origin in questionnaire responses. That is why we are dealing with the evaluation criteria for the case of categorical variables here. The criteria based on variability measures are proposed. Instead of variance as a measure for quantitative variables, three measures for nominal variables are considered: the variability measure based on a modal frequency, Gini’s coefficient of mutability, and the entropy. The proposed evaluation criteria are applied to a real-dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barbará, D., Li, Y., Couto, J.: COOLCAT: An entropy-based algorithm for categorical clustering. In: Proceedings of the 11th International Conference on Information and Knowledge Management, pp. 582–589. ACM Press, McLean (2002)

    Google Scholar 

  2. Calinski, T., Habarasz, J.: A dendrite method for cluster analysis. Communications in Statistics 3, 1–27 (1974)

    Article  Google Scholar 

  3. Chatuverdi, A., Foods, K., Green, P.E., Carroll, J.D.: K-modes clustering. Journal of Classification 18, 35–55 (2001)

    MathSciNet  Google Scholar 

  4. Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM, Philadelphia (2007)

    MATH  Google Scholar 

  5. Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS – Clustering categorical data using summaries. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 73–83. ACM Press, San Diego (1999)

    Chapter  Google Scholar 

  6. Gini, C.W.: Variability and Mutability. Contribution to the study of statistical distributions and relations. Studi Economico-Giuridici della R. Universita de Cagliari (1912); Reviewed in: Light, R.J., Margolin, B.H.: An Analysis of Variance for Categorical Data. J. American Statistical Association 66, 534–544 (1971)

    Article  MATH  MathSciNet  Google Scholar 

  7. Goodman, L.A., Kruskal, W.H.: Measures of association for crossclassification. Journal of the American Statistical Association 49, 732–764 (1954)

    Article  MATH  Google Scholar 

  8. Gordon, A.D.: Classification, 2nd edn. Chapman & Hall/CRC, Boca Raton (1999)

    MATH  Google Scholar 

  9. Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. Information Systems 25, 345–366 (2000)

    Article  Google Scholar 

  10. He, Z., Xu, X., Deng, S.: Squeezer: An efficient algorithm for clustering categorical data. Journal of Computer Science and Technology 17, 611–625 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  11. Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, University of British Columbia, pp. 1–8 (1997)

    Google Scholar 

  12. Huang, Z.: Extensions to the k-means algorithm to clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)

    Article  Google Scholar 

  13. Kogan, J.: Introduction to Clustering Large and High-Dimensional Data. Cambridge University Press, New York (2007)

    MATH  Google Scholar 

  14. Magidson, J., Vermunt, J.K.: Latent class models for clustering: A comparison with K-means. Canadian Journal of Marketing Research 20, 37–44 (2002)

    Google Scholar 

  15. Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in C: The Art of Scientific Computing, p. 634. Cambridge University Press, Cambridge (1988)

    MATH  Google Scholar 

  16. Sharma, S.: Applied Multivariate Techniques. John Wiley & Sons, Inc., New York (1995)

    Google Scholar 

  17. Sila, M.: Analysis of Internet Visits and Internet Users (in Czech). Diploma thesis. University of Economics, Prague (2010)

    Google Scholar 

  18. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. ACM SIGMOD Record 25, 103–114 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rezankova, H., Loster, T., Husek, D. (2011). Evaluation of Categorical Data Clustering. In: Mugellini, E., Szczepaniak, P.S., Pettenati, M.C., Sokhn, M. (eds) Advances in Intelligent Web Mastering – 3. Advances in Intelligent and Soft Computing, vol 86. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18029-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-18029-3_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-18028-6

  • Online ISBN: 978-3-642-18029-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics