MOSAIC: A Proximity Graph Approach for Agglomerative Clustering

Choo, Jiyeon; Jiamthapthaksin, Rachsuda; Chen, Chun-sheng; Celepcikay, Oner Ulvi; Giusti, Christian; Eick, Christoph F.

doi:10.1007/978-3-540-74553-2_21

Jiyeon Choo¹,
Rachsuda Jiamthapthaksin¹,
Chun-sheng Chen¹,
Oner Ulvi Celepcikay¹,
Christian Giusti² &
…
Christoph F. Eick¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4654))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1253 Accesses
11 Citations

Abstract

Representative-based clustering algorithms are quite popular due to their relative high speed and because of their sound theoretical foundation. On the other hand, the clusters they can obtain are limited to convex shapes and clustering results are also highly sensitive to initializations. In this paper, a novel agglomerative clustering algorithm called MOSAIC is proposed which greedily merges neighboring clusters maximizing a given fitness function. MOSAIC uses Gabriel graphs to determine which clusters are neighboring and approximates non-convex shapes as the unions of small clusters that have been computed using a representative-based clustering algorithm. The experimental results show that this technique leads to clusters of higher quality compared to running a representative clustering algorithm stand-alone. Given a suitable fitness function, MOSAIC is able to detect arbitrary shape clusters. In addition, MOSAIC is capable of dealing with high dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jiang, B.: Spatial Clustering for Mining Knowledge in Support of Generalization Processes in GIS. In: ICA Workshop on Generalisation and Multiple representation (2004)
Google Scholar
Tan, M., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Addison-Wesley, Reading (2005)
Google Scholar
Choo, J.: Using Proximity Graphs to Enhance Representative-based Clustering Algorithms. Master Thesis, Department of Computer Science, University of Houston, TX (2007)
Google Scholar
Gabriel, K., Sokal, R.: A New Statistical Approach to Geographic Variation Analysis. Systematic Zoology 18, 259–278 (1969)
Article Google Scholar
Toussaint, G.: The Relative Neighborhood Graph of A Finite Planar Set. In: Int. Conf. Pattern Recognition, vol. 12, pp. 261–268 (1980)
Google Scholar
Kirkpatrick, D.: A note on Delaunay and Optimal Triangulations. Information Processing Letters 10, 127–128 (1980)
Article MATH MathSciNet Google Scholar
Okabe, A., Boots, B., Sugihara, K.: Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. Wiley, New York (1992)
MATH Google Scholar
Bhattacharya, B., Poulsen, R., Toussaint, G.: Application of Proximity Graphs to Editing Nearest Neighbor Decision Rule. In: Int. Sym. on Information Theory (1981)
Google Scholar
Asano, T., Imai, H., Ibaraki, T., Nishizeki, T.: SIGAL 1990. LNCS, vol. 450, pp. 70–71. Springer, Heidelberg (1990)
MATH Google Scholar
Rousseeuw, P.J., Silhouettes, A.: Graphical Aid to The Interpretation and Validation of Cluster Analysis. Int. J. Computational and Applied Mathematics 20, 53–65 (1987)
Article MATH Google Scholar
Data Mining and Machine Learning Group website, University of Houston, Texas, http://www.tlc2.uh.edu/dmmlg/Datasets
UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: Density-Based Spatial Clustering of Applications with Noise. In: Int. Conf. Knowledge Discovery and Data Mining (1996)
Google Scholar
Anders, K.H.: A Hierarchical Graph-Clustering Approach to Find Groups of Objects. Technical Paper. In: ICA Commission on Map Generalization, 5th Workshop on Progress in Automated Map Generalization (2003)
Google Scholar
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications. In: Inf. Conf. Data Mining and Knowledge Discovery, pp. 169–194 (1998)
Google Scholar
Kriegel, H.P., Pfeifle, M.: Density-Based Clustering of Uncertain Data. In: Int. Conf. Knowledge Discovery in Data Mining, pp. 672–677 (2005)
Google Scholar
Hinneburg, A., Keim, D.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In: Conf. Knowledge Discovery in Data Mining (1998)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Databases. In: Int. Conf. ACM SIGMOD on Management of data, pp. 73–84. ACM Press, New York (1998)
Google Scholar
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer 32, 68–75 (1999)
Google Scholar
Lin, C., Chen, M.: A Robust and Efficient Clustering Algorithm based on Cohesion Self-Merging. In: Inf. Conf. 8th ACM SIGKDD on Knowledge Discovery and Data Mining, pp. 582–587. ACM Press, New York (2002)
Google Scholar
Zhong, S., Ghosh, J.: A Unified Framework for Model-based Clustering. Int. J. Machine Learning Research 4, 1001–1037 (2003)
Article MathSciNet Google Scholar
Surdeanu, M., Turmo, J., Ageno, A.: A Hybrid Unsupervised Approach for Document Clustering. In: Int. Conf. 11h ACM SIGKDD on Knowledge Discovery in Data Mining, pp. 685–690. ACM Press, New York (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Houston, Houston, TX 77204-3010, USA
Jiyeon Choo, Rachsuda Jiamthapthaksin, Chun-sheng Chen, Oner Ulvi Celepcikay & Christoph F. Eick
Department of Mathematics and Computer Science, University of Udine, Via delle Scienze, 33100, Udine, Italy
Christian Giusti

Authors

Jiyeon Choo
View author publications
You can also search for this author in PubMed Google Scholar
Rachsuda Jiamthapthaksin
View author publications
You can also search for this author in PubMed Google Scholar
Chun-sheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Oner Ulvi Celepcikay
View author publications
You can also search for this author in PubMed Google Scholar
Christian Giusti
View author publications
You can also search for this author in PubMed Google Scholar
Christoph F. Eick
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Il Yeal Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choo, J., Jiamthapthaksin, R., Chen, Cs., Celepcikay, O.U., Giusti, C., Eick, C.F. (2007). MOSAIC: A Proximity Graph Approach for Agglomerative Clustering. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_21

Download citation

DOI: https://doi.org/10.1007/978-3-540-74553-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74552-5
Online ISBN: 978-3-540-74553-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics