Skip to main content

An Indexing Structure for Dynamic Multidimensional Data in Vector Space

  • Conference paper
Advances in Databases and Information Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 186))

  • 856 Accesses

Abstract

The multidimensional kNN (k nearest neighbors) query problem is relevant to a large variety of database applications, including information retrieval, natural language processing, and data mining. To solve it efficiently, the database needs an indexing structure that provides this kind of search. However, attempts to find an exact solution are hardly feasible in multidimensional space. In this paper, a novel indexing technique for the approximate solution of kNN problem is described and analyzed. The construction of the indexing tree is based on clustering. Indexing structure is implemented on top of high-performance industrial DBMS.

This research is supported by Russian Foundation for Basic Research, grant 10-07-00156.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008), http://doi.acm.org/10.1145/1327452.1327494 , doi:10.1145/1327452.1327494

    Article  Google Scholar 

  2. Barton, S., Gouet-Brunet, V., Rukoz, M.: Large scale disk-based metric indexing structure for approximate information retrieval by content. In: Proceedings of the 1st Workshop on New Trends in Similarity Search, NTSS 2011, pp. 2–7. ACM, New York (2011), http://doi.acm.org/10.1145/1966865.1966869 , doi:10.1145/1966865.1966869

    Chapter  Google Scholar 

  3. Chen, J., Fang, H.R., Saad, Y.: Fast approximate knn graph construction for high dimensional data via recursive lanczos bisection. J. Mach. Learn. Res. 10, 1989–2012 (2009), http://dl.acm.org/citation.cfm?id=1577069.1755852

    MathSciNet  MATH  Google Scholar 

  4. Ciaccia, P., Patella, M.: Bulk loading the m-tree. In: Proceedings of the 9th Australasian Database Conference, ADC 1998, pp. 15–26 (1998)

    Google Scholar 

  5. Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, pp. 426–435. Morgan Kaufmann Publishers Inc., San Francisco (1997), http://dl.acm.org/citation.cfm?id=645923.671005

    Google Scholar 

  6. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, SCG 2004, pp. 253–262. ACM, New York (2004), http://doi.acm.org/10.1145/997817.997857 , doi:10.1145/997817.997857

    Chapter  Google Scholar 

  7. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 518–529. Morgan Kaufmann Publishers Inc., San Francisco (1999), http://dl.acm.org/citation.cfm?id=645925.671516

    Google Scholar 

  8. Gudmundsson, G.T., Jónsson, B.T., Amsaleg, L.: A large-scale performance study of cluster-based high-dimensional indexing. In: Proceedings of the International Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval, VLS-MCMR 2010, pp. 31–36. ACM, New York (2010), http://doi.acm.org/10.1145/1878137.1878145 , doi:10.1145/1878137.1878145

    Chapter  Google Scholar 

  9. Günnemann, S., Kremer, H., Lenhard, D., Seidl, T.: Subspace clustering for indexing high dimensional data: a main memory index based on local reductions and individual multi-representations. In: Proceedings of the 14th International Conference on Extending Database Technology, EDBT/ICDT 2011, pp. 237–248. ACM, New York (2011), http://doi.acm.org/10.1145/1951365.1951395 , doi:10.1145/1951365.1951395

    Chapter  Google Scholar 

  10. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD 1984, pp. 47–57. ACM, New York (1984), http://doi.acm.org/10.1145/602259.602266 , doi:10.1145/602259.602266

    Chapter  Google Scholar 

  11. Guttman, A.: R-trees: a dynamic index structure for spatial searching. SIGMOD Rec. 14(2), 47–57 (1984), http://doi.acm.org/10.1145/971697.602266 , doi:10.1145/971697.602266

    Article  Google Scholar 

  12. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, pp. 604–613. ACM, New York (1998), http://doi.acm.org/10.1145/276698.276876 , doi:10.1145/276698.276876

    Chapter  Google Scholar 

  13. Moreno-Seco, F., Milcó, L., Oncina, J.: A modification of the laesa algorithm for approximated k-nn classification. Pattern Recogn. Lett. 24(1-3), 47–53 (2003), http://dx.doi.org/10.1016/S0167-86550200187-3 , doi:10.1016/S0167-8655(02)00187-3

    Article  MATH  Google Scholar 

  14. Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: Popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 11, 2487–2531 (2010), http://dl.acm.org/citation.cfm?id=1756006.1953015

    MathSciNet  MATH  Google Scholar 

  15. Sharifzadeh, M., Shahabi, C.: Vor-tree: R-trees with voronoi diagrams for efficient processing of spatial nearest neighbor queries. Proc. VLDB Endow. 3(1-2), 1231–1242 (2010), http://dl.acm.org/citation.cfm?id=1920841.1920994

    Google Scholar 

  16. Shen, H.T., Zhou, X., Zhou, A.: An adaptive and dynamic dimensionality reduction method for high-dimensional indexing. The VLDB Journal 16(2), 219–234 (2007), http://dx.doi.org/10.1007/s00778-005-0167-3 , doi:10.1007/s00778-005-0167-3

    Article  Google Scholar 

  17. Skopal, T.: Where are you heading, metric access methods?: a provocative survey. In: Proceedings of the Third International Conference on SImilarity Search and Applications, SISAP 2010, pp. 13–21. ACM, New York (2010), http://doi.acm.org/10.1145/1862344.1862347 , doi:10.1145/1862344.1862347

    Chapter  Google Scholar 

  18. Skopal, T., Pokorný, J., Krátký, M., Snášel, V.: Revisiting M-Tree Building Principles. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 148–162. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  19. Thomasian, A., Zhang, L.: Persistent clustered main memory index for accelerating k-nn queries on high dimensional datasets. Multimedia Tools Appl. 38(2), 253–270 (2008), http://dx.doi.org/10.1007/s11042-007-0179-7 , doi:10.1007/s11042-007-0179-7

    Article  Google Scholar 

  20. Traina Jr., C., Traina, A.J.M., Seeger, B., Faloutsos, C.: Slim-Trees: High Performance Metric Trees Minimizing Overlap between Nodes. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 51–65. Springer, Heidelberg (2000), http://dl.acm.org/citation.cfm?id=645339.650146

    Chapter  Google Scholar 

  21. Vidal, E.: New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (aesa). Pattern Recognition Letters 15(1), 1–7 (1994)

    Article  MathSciNet  Google Scholar 

  22. Xu, J., Zheng, B., Lee, W.C., Lun Lee, D.: The d-tree: An index structure for planar point queries in location-based wireless services. IEEE Trans. on Knowl. and Data Eng. 16(12), 1526–1542 (2004), http://dx.doi.org/10.1109/TKDE.2004.97 , doi:10.1109/TKDE.2004.97

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elena Mikhaylova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mikhaylova, E., Novikov, B., Volokhov, A. (2013). An Indexing Structure for Dynamic Multidimensional Data in Vector Space. In: Morzy, T., Härder, T., Wrembel, R. (eds) Advances in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32741-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32741-4_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32740-7

  • Online ISBN: 978-3-642-32741-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics