Skip to main content

M-FastMap: A Modified FastMap Algorithm for Visual Cluster Validation in Data Mining

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Included in the following conference series:

  • 2170 Accesses

Abstract

This paper presents M-FastMap, a modified FastMap algorithm for visual cluster validation in data mining. In the visual cluster validation with FastMap, clusters are first generated with a clustering algorithm from a database. Then, the FastMap algorithm is used to project the clusters onto a 2-dimensional (2D) or 3-dimensional (3D) space and the clusters are visualized with different colors and/or symbols on a 2D (or 3D) display. From the display a human can visually examine the separation of clusters. This method follows the principle that if a cluster is separate from others in the projected 2D (or 3D) space, it is also separate from others in the original high dimensional space (the opposite is not true). The modified FastMap algorithm improves the quality of visual cluster validation by optimizing the separation of clusters on the 2D or (3D) space in the selection of pivot objects (or projection axis). The comparison study has shown that the modified FastMap algorithm can produce better visualization results than the original FastMap algorithm.

supported in part by RGC Grant No. 7132/00P and HKU CRCG Grant Nos 10203501, 10203907 and 10203408.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Gehrke, J, Gunopulos, D. and Raghavan, P. (1998) Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of SIGMOD Conference.

    Google Scholar 

  2. Cormack, R. (1971) A review of classification. Journal of Royal Statistical Society, Series A, Vol. 134, pp. 321–367.

    Article  MathSciNet  Google Scholar 

  3. Cox, T and Cox, M (1994) Multidimensional Scaling. Chapman & Hall.

    Google Scholar 

  4. Dubes, R. C. (1987) How many clusters are best?-an experiment. Pattern Recognition, Vol. 20, No. 6, pp. 645–663.

    Article  Google Scholar 

  5. Dubes, R. and Jain, A. K. (1979) Validity studies in clustering methodologies. Pattern Recognition, Vol. 11, pp. 235–254.

    Article  MATH  Google Scholar 

  6. Ester, M., Kriegel, H.-P., Sander, J. and Xu, X. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining, Portland, Oregon, USA.

    Google Scholar 

  7. Everitt, B. (1974) Cluster Analysis. Heinemann Educational Books Ltd.

    Google Scholar 

  8. Faloutsos, C. and Lin, K., (1995) Fastmap: a fast algorithm for indexing, datamining and visualization of traditional and multimedia datasets. In Proceedings of ACM-SIGMOD, pp. 163–174.

    Google Scholar 

  9. Fukunaga, K. (1990) Introduction to Statistical Pattern Recognition. Academic Press.

    Google Scholar 

  10. Ganti, V., Ramakrishnan, R., Gehrke, J, Powell, A. L. and French, J. C. (1999) Clustering large datasets in arbitrary metric spaces. ICDE 1999, pp. 502–511.

    Google Scholar 

  11. Gordon, A. D. (1998) Cluster validation, In Data Science, Classification, and Related Methods, ed. C Hayashi, N Ohsumi, K Yajima, Y Tanaka, H-H Bock and Y Baba, Springer, Tokyo, pp 22–39.

    Google Scholar 

  12. Gordon, A. D. (1994) Identifying genuine clusters in a classification. Computational Statistics and Data Analysis 18, pp. 516–581.

    Article  Google Scholar 

  13. Huang, Z. (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, Vol. 2, No. 3, pp. 283–304.

    Article  Google Scholar 

  14. Huang, Z. and Lin, T. (2000) A visual method of cluster validation with Fastmap. In Proceedings of PAKDD2000, Kyoto, Japan.

    Google Scholar 

  15. Huang, Z., Ng, M. K. and Cheung, D. W. (2001) An empirical study on the visual cluster validation method with Fastmap. In Proceedings of DASFAA2001, Hong Kong.

    Google Scholar 

  16. Jain, A. K. and Dubes, R. C. (1988) Algorithms for Clustering Data. Prentice Hall.

    Google Scholar 

  17. Kruskal, J. B. and Carroll, J. D. (1969) Geometrical models and badness-of-fit functions, in Multivariate Analysis II, ed. P. R. Krishnaiah, Academic Press, pp. 639–670.

    Google Scholar 

  18. Milligan, G. W. (1996) Clustering validation: results and implications for applied analysis. in Clustering and Classification, ed. P. Arabie, L. J. Hubert and G. De Soete, World Scientific, pp. 341–375.

    Google Scholar 

  19. Milligan, G. W. (1981) A Monte Carlo study of thirty internal criterion measures for cluster analysis. Psychometrika, Vol. 46, No. 2, pp. 187–199.

    Article  MATH  MathSciNet  Google Scholar 

  20. Milligan, G. W. and Cooper, M. C. (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika, Vol. 50, No. 2, pp. 159–179.

    Article  Google Scholar 

  21. Milligan, G. W. and Isaac, P. D. (1980) The validation of four ultrametric clustering algorithms. Pattern Recognition, Vol. 12, pp. 41–50.

    Article  Google Scholar 

  22. Ng, R. and Han, J. (1994) Efficient and effective clustering methods for spatial data mining. In Proceedings of VLDB, 1994.

    Google Scholar 

  23. Rousseeuw, P. J. (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, Vol. 20, pp. 53–65.

    Article  MATH  Google Scholar 

  24. Theodoridis, S. and Koutroumbas, K. (1999) Pattern Recognition. Academic Press.

    Google Scholar 

  25. Young, F. W. (1987) Multidimensional scaling: history, theory and applications. Lawrence Erlbaum Associates.

    Google Scholar 

  26. Zhang, T. and Ramakrishnan, R. (1997) BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, Vol. 1, No. 2, pp. 141–182.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ng, M., Huang, J. (2002). M-FastMap: A Modified FastMap Algorithm for Visual Cluster Validation in Data Mining. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_22

Download citation

  • DOI: https://doi.org/10.1007/3-540-47887-6_22

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43704-8

  • Online ISBN: 978-3-540-47887-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics