Skip to main content

Supporting KDD Applications by the k-Nearest Neighbor Join

  • Conference paper
Database and Expert Systems Applications (DEXA 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2736))

Included in the following conference series:

Abstract

The similarity join has become an important database primitive to sup-port similarity search and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Well-known are two types of the similarity join, the distance range join where the user defines a distance threshold for the join, and the closest point query or k-distance join which retrieves the k most similar pairs. In this paper, we propose an important, third similarity join operation called k-nearest neighbor join which combines each point of one point set with its k nearest neighbors in the other set. We discover that many standard algorithms of Knowledge Discovery in Databases (KDD) such as k-means and k-medoid clustering, nearest neighbor classifi-cation, data cleansing, postprocessing of sampling-based data mining etc. can be implemented on top of the k-nn join operation to achieve performance improve-ments without affecting the quality of the result of these algorithms. Our list of possible applications includes standard methods for all stages of the KDD process including preprocessing, data mining, and postprocessing. Thus, our method is turbo charging the complete KDD process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: ACM SIGMOD Int. Conf. on Management of Data (1999)

    Google Scholar 

  2. Agrawal, R., Lin, K., Sawhney, H., Shim, K.: Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. In: Int. Conf on Very Large Data Bases, VLDB (1995)

    Google Scholar 

  3. Brachmann, R., Anand, T.: The Process of Knowledge Discovery in Database. In: Fayyad, U.M., et al. (eds.) Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1996)

    Google Scholar 

  4. Böhm, C., Braunmüller, B., Breunig, M.M., Kriegel, H.-P.: Fast Clustering Based on High-Dimensional Similarity Joins. In: Int. Conf. on Information Knowledge Management, CIKM (2000)

    Google Scholar 

  5. Berchtold, S., Böhm, C., Keim, D., Kriegel, H.-P.: A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space. In: ACM Symposium on Principles of Database Systems, PODS (1997)

    Google Scholar 

  6. Böhm, C., Braunmüller, B., Krebs, F., Kriegel, H.-P.: Epsilon Grid Order: An Algorithm for the Similarity Join on Massive High-Dimensional Data. In: ACM SIGMOD Int. Conf. on Management of Data (2001)

    Google Scholar 

  7. Böhm, C., Kriegel, H.-P.: A Cost Model and Index Architecture for the Similarity Join. In: IEEE Int. Conf on Data Engineering, ICDE (2001)

    Google Scholar 

  8. Brinkhoff, T., Kriegel, H.-P., Seeger, B.: Efficient Processing of Spatial Joins Using R-trees. In: ACM SIGMOD Int. Conf. on Management of Data (1993)

    Google Scholar 

  9. Breunig, M.M., Kriegel, H.-P., Kröger, P., Sander, J.: Data Bubbles: Quality Preserving Performance Boosting for Hierarchical Clustering. In: ACM SIGMOD Int. Conf. on Management of Data (2001)

    Google Scholar 

  10. Böhm, C.: The Similarity Join: A Powerful Database Primitive for High Performance Data Mining, tutorial. In: IEEE Int. Conf. on Data Engineering, ICDE (2001)

    Google Scholar 

  11. Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest Pair Queries in Spatial Databases. In: ACM SIGMOD Int. Conf. on Management of Data (2000)

    Google Scholar 

  12. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery: An Overview. In: Fayyad, U.M., et al. (eds.) Advances in Knowledge Discovery and Data Mining. AAAI Press, Menlo Park (1996)

    Google Scholar 

  13. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  14. Hjaltason, G.R., Samet, H.: Ranking in Spatial Databases. In: Int. Symp. on Large Spatial Datab, SSD (1995)

    Google Scholar 

  15. Hjaltason, G.R., Samet, H.: Incremental Distance Join Algorithms for Spatial Databases. In: SIGMOD Int. Conf. on Management of Data (1998)

    Google Scholar 

  16. Hattori, K., Torii, Y.: Effective algorithms for the nearest neighbor method in the clustering problem. Pattern Recognition 26(5) (1993)

    Google Scholar 

  17. Koudas, N., Sevcik, C.: Size Separation Spatial Join. In: ACM SIGMOD Int. Conf. on Managem. of Data (1997)

    Google Scholar 

  18. Koudas, N., Sevcik, C.: High Dimensional Similarity Joins: Algorithms and Performance Evaluation. In: IEEE Int. Conf. on Data Engineering (ICDE) (1998) (best paper award)

    Google Scholar 

  19. Preparata, F.P., Shamos, M.I.: Computational Geometry. Springer, Heidelberg (1985)

    Google Scholar 

  20. Roussopoulos, N., Kelley, S., Vincent, F.: Nearest Neighbor Queries. In: ACM SIGMOD Int. Conf. on Management of Data (1995)

    Google Scholar 

  21. Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications. Data Mining and Knowledge Discovery 2(2). Kluwer Academic Publishers(1998)

    Google Scholar 

  22. Shin, H., Moon, B., Lee, S.: Adaptive Multi-Stage Distance Join Processing. In: ACM SIGMOD Int. Conf. on Management of Data (2000)

    Google Scholar 

  23. Shim, K., Srikant, R., Agrawal, R.: High-Dimensional Similarity Joins. In: IEEE Int. Conf. on Data Engin. (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Böhm, C., Krebs, F. (2003). Supporting KDD Applications by the k-Nearest Neighbor Join. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds) Database and Expert Systems Applications. DEXA 2003. Lecture Notes in Computer Science, vol 2736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45227-0_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45227-0_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40806-2

  • Online ISBN: 978-3-540-45227-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics