Data Reduction Method for Categorical Data Clustering

Rendón, Eréndira; Sánchez, J. Salvador; Garcia, Rene A.; Abundez, Itzel; Gutierrez, Citlalih; Gasca, Eduardo

doi:10.1007/978-3-540-88309-8_15

Eréndira Rendón⁵,
J. Salvador Sánchez⁶,
Rene A. Garcia⁵,
Itzel Abundez⁵,
Citlalih Gutierrez⁵ &
…
Eduardo Gasca⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5290))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1424 Accesses
1 Citations

Abstract

Categorical data clustering constitutes an important part of data mining; its relevance has recently drawn attention from several researchers. As a step in data mining, however, clustering encounters the problem of large amount of data to be processed. This article offers a solution for categorical clustering algorithms when working with high volumes of data by means of a method that summarizes the database. This is done using a structure called CM-tree. In order to test our method, the K-Modes and Click clustering algorithms were used with several databases. Experiments demonstrate that the proposed summarization method improves execution time, without losing clustering quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, New York (1990)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithm for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Google Scholar
Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik, K.C.: LIMBO: A scalable Algorithm to Cluster Categorical Data. Technical report, University of Toronto, Department of Computer Science, CSRG-467 (2004)
Google Scholar
Huang, Z.: A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. In: Sigmod Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 1–8 (1997)
Google Scholar
Ganti, V., Gehrkeand, J., Ramakrishanan, R.: CACTUS-Clustering Categorical Data Using Summaries. In: Proceeding of the 5th ACM Sigmod International Conference on Knowledge Discovery in Databases, San Diego, California, pp. 73–83 (1999)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: Rock: A robust clustering algorithm for categorical attributes. In: Proceeding of the 15th International Conference on Data Engineering (ICDE), Sydney, pp. 512–521 (1999)
Google Scholar
Barbará, D., Li, Y., Couto, J.: Coolcat: an entropy-based algorithm for categorical clustering, pp. 582–589. ACM Press, New York (2002)
Google Scholar
Zaki, M.J., Peters, M., Assent, I., Seidl, T.: CLICK: An Effective algorithm for Mining Subspace Clustering in categorical datasets. In: Proceeding of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 733–742 (2005)
Google Scholar
Gowda, K., Diday, E.: Symbolic Clustering Using a New Dissimilarity Measure. Pattern Recognition 24(6), 567–578 (1991)
Article Google Scholar
Rendón, E., Sánchez, J.S.: Clustering Based on Compressed Data for Categorical and Mixed Attibutes. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR 2006 and SPR 2006. LNCS, vol. 4109, pp. 817–825. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Lab. Reconocimiento de Patrones, Instituto Tecnológico de Toluca, Av. Tecnológico s/n, 52140, Metepec, (México)
Eréndira Rendón, Rene A. Garcia, Itzel Abundez, Citlalih Gutierrez & Eduardo Gasca
Dept. Llenguatges i Sistemes Informàtics, Universitat Jaume I, Av. Sos Baynat s/n, E-12071, Castelló de la Plana, (Spain)
J. Salvador Sánchez

Authors

Eréndira Rendón
View author publications
You can also search for this author in PubMed Google Scholar
J. Salvador Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Rene A. Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Itzel Abundez
View author publications
You can also search for this author in PubMed Google Scholar
Citlalih Gutierrez
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Gasca
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICREA & Universitat Pompeu Fabra, Paseo de Circumvalacion 8, 08003, Barcelona, Spain
Hector Geffner
IST-UTL and INESC-ID, Av. Prof. Cavaco Silva - Taguspark, 2744-016, Porto Salvo, Portugal
Rui Prada
ADETTI/ISCTE and ISCTE, Lisbon University Institute, Av. das Forças Armadas, 1649-026, Lisbon, Portugal
Isabel Machado Alexandre
ADETTI/ISCTE and ISCTE, Lisbon University Institute, , Av. das Forças Armadas, 1649-026, Lisbon, Portugal
Nuno David

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rendón, E., Sánchez, J.S., Garcia, R.A., Abundez, I., Gutierrez, C., Gasca, E. (2008). Data Reduction Method for Categorical Data Clustering. In: Geffner, H., Prada, R., Machado Alexandre, I., David, N. (eds) Advances in Artificial Intelligence – IBERAMIA 2008. IBERAMIA 2008. Lecture Notes in Computer Science(), vol 5290. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88309-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-88309-8_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88308-1
Online ISBN: 978-3-540-88309-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics