M-FastMap: A Modified FastMap Algorithm for Visual Cluster Validation in Data Mining

Ng, Michael; Huang, Joshua

doi:10.1007/3-540-47887-6_22

Michael Ng⁴ &
Joshua Huang⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2170 Accesses

Abstract

This paper presents M-FastMap, a modified FastMap algorithm for visual cluster validation in data mining. In the visual cluster validation with FastMap, clusters are first generated with a clustering algorithm from a database. Then, the FastMap algorithm is used to project the clusters onto a 2-dimensional (2D) or 3-dimensional (3D) space and the clusters are visualized with different colors and/or symbols on a 2D (or 3D) display. From the display a human can visually examine the separation of clusters. This method follows the principle that if a cluster is separate from others in the projected 2D (or 3D) space, it is also separate from others in the original high dimensional space (the opposite is not true). The modified FastMap algorithm improves the quality of visual cluster validation by optimizing the separation of clusters on the 2D or (3D) space in the selection of pivot objects (or projection axis). The comparison study has shown that the modified FastMap algorithm can produce better visualization results than the original FastMap algorithm.

supported in part by RGC Grant No. 7132/00P and HKU CRCG Grant Nos 10203501, 10203907 and 10203408.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Gehrke, J, Gunopulos, D. and Raghavan, P. (1998) Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of SIGMOD Conference.
Google Scholar
Cormack, R. (1971) A review of classification. Journal of Royal Statistical Society, Series A, Vol. 134, pp. 321–367.
Article MathSciNet Google Scholar
Cox, T and Cox, M (1994) Multidimensional Scaling. Chapman & Hall.
Google Scholar
Dubes, R. C. (1987) How many clusters are best?-an experiment. Pattern Recognition, Vol. 20, No. 6, pp. 645–663.
Article Google Scholar
Dubes, R. and Jain, A. K. (1979) Validity studies in clustering methodologies. Pattern Recognition, Vol. 11, pp. 235–254.
Article MATH Google Scholar
Ester, M., Kriegel, H.-P., Sander, J. and Xu, X. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining, Portland, Oregon, USA.
Google Scholar
Everitt, B. (1974) Cluster Analysis. Heinemann Educational Books Ltd.
Google Scholar
Faloutsos, C. and Lin, K., (1995) Fastmap: a fast algorithm for indexing, datamining and visualization of traditional and multimedia datasets. In Proceedings of ACM-SIGMOD, pp. 163–174.
Google Scholar
Fukunaga, K. (1990) Introduction to Statistical Pattern Recognition. Academic Press.
Google Scholar
Ganti, V., Ramakrishnan, R., Gehrke, J, Powell, A. L. and French, J. C. (1999) Clustering large datasets in arbitrary metric spaces. ICDE 1999, pp. 502–511.
Google Scholar
Gordon, A. D. (1998) Cluster validation, In Data Science, Classification, and Related Methods, ed. C Hayashi, N Ohsumi, K Yajima, Y Tanaka, H-H Bock and Y Baba, Springer, Tokyo, pp 22–39.
Google Scholar
Gordon, A. D. (1994) Identifying genuine clusters in a classification. Computational Statistics and Data Analysis 18, pp. 516–581.
Article Google Scholar
Huang, Z. (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, Vol. 2, No. 3, pp. 283–304.
Article Google Scholar
Huang, Z. and Lin, T. (2000) A visual method of cluster validation with Fastmap. In Proceedings of PAKDD2000, Kyoto, Japan.
Google Scholar
Huang, Z., Ng, M. K. and Cheung, D. W. (2001) An empirical study on the visual cluster validation method with Fastmap. In Proceedings of DASFAA2001, Hong Kong.
Google Scholar
Jain, A. K. and Dubes, R. C. (1988) Algorithms for Clustering Data. Prentice Hall.
Google Scholar
Kruskal, J. B. and Carroll, J. D. (1969) Geometrical models and badness-of-fit functions, in Multivariate Analysis II, ed. P. R. Krishnaiah, Academic Press, pp. 639–670.
Google Scholar
Milligan, G. W. (1996) Clustering validation: results and implications for applied analysis. in Clustering and Classification, ed. P. Arabie, L. J. Hubert and G. De Soete, World Scientific, pp. 341–375.
Google Scholar
Milligan, G. W. (1981) A Monte Carlo study of thirty internal criterion measures for cluster analysis. Psychometrika, Vol. 46, No. 2, pp. 187–199.
Article MATH MathSciNet Google Scholar
Milligan, G. W. and Cooper, M. C. (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, Vol. 50, No. 2, pp. 159–179.
Article Google Scholar
Milligan, G. W. and Isaac, P. D. (1980) The validation of four ultrametric clustering algorithms. Pattern Recognition, Vol. 12, pp. 41–50.
Article Google Scholar
Ng, R. and Han, J. (1994) Efficient and effective clustering methods for spatial data mining. In Proceedings of VLDB, 1994.
Google Scholar
Rousseeuw, P. J. (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, Vol. 20, pp. 53–65.
Article MATH Google Scholar
Theodoridis, S. and Koutroumbas, K. (1999) Pattern Recognition. Academic Press.
Google Scholar
Young, F. W. (1987) Multidimensional scaling: history, theory and applications. Lawrence Erlbaum Associates.
Google Scholar
Zhang, T. and Ramakrishnan, R. (1997) BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, Vol. 1, No. 2, pp. 141–182.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, The University of Hong Kong, Pokfulam Road, Hong Kong
Michael Ng
E-Business Technology Institute, The University of Hong Kong, Pokfulam Road, Hong Kong
Joshua Huang

Authors

Michael Ng
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EE Department, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, Taiwan, ROC
Ming-Syan Chen
IBM Thomas J. Watson Research Center, 30 Sawmill River Road, Hawthorne, NY, 10532, USA
Philip S. Yu
School of Computing, National University of Singapore, Lower Kent Ridge Road, Singapore, 119260
Bing Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ng, M., Huang, J. (2002). M-FastMap: A Modified FastMap Algorithm for Visual Cluster Validation in Data Mining. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_22

Download citation

DOI: https://doi.org/10.1007/3-540-47887-6_22
Published: 29 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics