Skip to main content

AGRID: An Efficient Algorithm for Clustering Large High-Dimensional Datasets

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2637))

Included in the following conference series:

Abstract

The clustering algorithm GDILC relies on density-based clustering with grid and is designed to discover clusters of arbitrary shapes and eliminate noises. However, it is not scalable to large high-dimensional datasets. In this paper, we improved this algorithm in five important directions. Through these improvements, AGRID is of high scalability and can process large high-dimensional datasets. It can discover clusters of various shapes and eliminate noises effectively. Besides, it is insensitive to the order of input and is a nonparametric algorithm. The high speed and accuracy of the AGRID clustering algorithm was shown in our experiments.

The participation of the conference is supported by NOKIA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Ankerst, M. Breunig, et al, “OPTICS: Ordering points to identify the clustering structure”, In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’99), pp. 49–60, Philadelphia, PA, June 1999.

    Google Scholar 

  2. R. Agrawal, J. Gehrke, et al, “Automatic subspace clustering of high dimensional data for data mining aplications”, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pp. 94–105, Seattle, WA, June 1998.

    Google Scholar 

  3. K. Alsabti, S. Ranka, V. Singh, “An Efficient K-Means Clustering Algorithm,” Proc. the First Workshop on High Performance Data Mining, Orlando, Florida, 1998.

    Google Scholar 

  4. P.S. Bradley, O.L. Mangasarian, “K-Plane Clustering,” Journal of Global Optimization 16, Number 1, 2000, pp. 23–32.

    Article  MATH  MathSciNet  Google Scholar 

  5. M. Ester, H.-P. Kriegel et al, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” In Proc. 1996 Int. Conf. On Knowledge Discovery and Data Mining (KDD’96), pp. 226–231, Portland, Oregon, Aug. 1996.

    Google Scholar 

  6. S. Guha, R. Rastogi, and K. Shim, “Cure: An efficient clustering algorithm for large databases”, In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), pp. 73–84, Seattle, WA, June 1998.

    Google Scholar 

  7. S. Guha, R. Rastogi, and K. Shim, “Rock: A robust clustering algorithm for categorical attributes”, In Proc. 1999 Int. Conf. Data Engineering (ICDE’99), pp. 512–521, Sydney, Australia, Mar. 1999.

    Google Scholar 

  8. A. Hinneburg and D.A. Keim, “An efficient approach to clustering in large multimedia databases with noise”, In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD’98), pp. 58–65, New York, Aug. 1998.

    Google Scholar 

  9. Jiawei Han, Micheline Kamber, “Data Mining: Concepts and Techniques,” Higher Education Press, Morgan Kaufmann Publishers, 2001.

    Google Scholar 

  10. Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with categorical values”, Data Mining and Knowledge Discovery, 2:283–304, 1998.

    Article  Google Scholar 

  11. Zhexue Huang, “A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining,” In SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (SIGMOD-DMKD’97), Tucson, Arizona, May 1997.

    Google Scholar 

  12. G. Karypis, E.-H. Han, and V. Kumar, “CHAMELEON: A hierarchical clustering algorithm using dynamic modeling”, IEEE Computer, Special Issue on Data Analysis and Mining, Vol. 32, No. 8, August 1999, pp. 68–75.

    Google Scholar 

  13. R. Ng and J. Han, “Efficient and effective clustering method for spatial data mining”, In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB’94), pp. 144–155, Santiago, Chile, Sept. 1994.

    Google Scholar 

  14. G. Sheikholeslami, S. Chatterjee, and A. Zhang, “WaveCluster: A multi-resolution clustering approach for very large spatial databases”, In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB’98), pp. 428–429, New York, Aug. 1998.

    Google Scholar 

  15. Jörg Sander, Martin Ester, Hans-Peter Kriegel, Xiaowei Xu: “Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications”, Data Mining and Knowledge Discovery, Vol. 2, No 2, June 1998.

    Google Scholar 

  16. W. Wang, J. Yang, and R. Muntz, “STING: A statistical information grid approach to spatial data mining”, In Proc. 1997 Int. Conf. Very Large Data Bases (VLDB’97), pp. 186–195, Athens, Greece, Aug. 1997.

    Google Scholar 

  17. T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases”, In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’96), pp. 103–114, Montreal, Canada, June 1996.

    Google Scholar 

  18. Zhao Yanchang, Song Junde, “GDILC: A Grid-based Density Iso-line Clustering Algorithm,” Proc. Int. Conf. Info-tech and Info-net (ICII 2001), Beijing, China, Oct. 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yanchang, Z., Junde, S. (2003). AGRID: An Efficient Algorithm for Clustering Large High-Dimensional Datasets. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science(), vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8_27

Download citation

  • DOI: https://doi.org/10.1007/3-540-36175-8_27

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-04760-5

  • Online ISBN: 978-3-540-36175-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics