Skip to main content

Engineering Efficient and Effective Non-metric Space Library

  • Conference paper
Similarity Search and Applications (SISAP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8199))

Included in the following conference series:

Abstract

We present a new similarity search library and discuss a variety of design and performance issues related to its development. We adopt a position that engineering is equally important to design of the algorithms and pursue a goal of producing realistic benchmarks. To this end, we pay attention to various performance aspects and utilize modern hardware, which provides a high degree of parallelization. Since we focus on realistic measurements, performance of the methods should not be measured using merely the number of distance computations performed, because other costs, such as computation of a cheaper distance function, which approximates the original one, are oftentimes substantial. The paper includes preliminary experimental results, which support this point of view. Rather than looking for the best method, we want to ensure that the library implements competitive baselines, which can be useful for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amato, G., Rabitti, F., Savino, P., Zezula, P.: Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inf. Syst. 21(2), 192–227 (2003)

    Article  Google Scholar 

  2. Amato, G., Savino, P.: Approximate similarity search in metric spaces using inverted files. In: Proceedings of the 3rd International Conference on Scalable Information Systems, InfoScale 2008, pp. 28:1–28:10. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), Brussels (2008)

    Google Scholar 

  3. Bhattacharya, P., Neamtiu, I.: Assessing programming language impact on development and maintenance: a study on C and C++. In: 33rd International Conference on Software Engineering (ICSE), pp. 171–180 (2011)

    Google Scholar 

  4. Cayton, L.: Fast nearest neighbor retrieval for bregman divergences. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 112–119. ACM, New York (2008)

    Chapter  Google Scholar 

  5. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Searching in metric spaces. ACM Computing Surveys 33(3), 273–321 (2001)

    Article  Google Scholar 

  6. Chávez, E., Navarro, G.: Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces. Information Processing Letters 85(1), 39–46 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  7. Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling lsh for performance tuning. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, pp. 669–678. ACM, New York (2008)

    Google Scholar 

  8. Drepper, U.: What every programmer should know about memory (2007), http://www.akkadia.org/drepper/cpumemory.pdf (last checked August 2012)

  9. Eddelbuettel, D., Francois, R.: Rcpp: Seamless R and C++ integration. Journal of Statistical Software 40(8), 1–18 (2011)

    Google Scholar 

  10. Elizarov, R.: Millions quotes per second in pure Java (2013), http://blog.devexperts.com/millions-quotes-per-second-in-pure-java/ (last accessed on May 14, 2013)

  11. Esuli, A.: Use of permutation prefixes for efficient and scalable approximate similarity search. Inf. Process. Manage. 48(5), 889–902 (2012)

    Article  Google Scholar 

  12. Faloutsos, C.: Searching Multimedia Databases by Content. Kluwer Academic Publisher (1996)

    Google Scholar 

  13. Figueroa, K., Navarro, G., Chávez, E.: Metric Spaces Library (2007), http://www.sisap.org/Metric_Space_Library.html

  14. Figueroa, K., Fredriksson, K.: Speeding up permutation based indexing with indexing. In: Proceedings of the 2009 Second International Workshop on Similarity Search and Applications, SISAP 2009, pp. 107–114. IEEE Computer Society, Washington, DC (2009)

    Chapter  Google Scholar 

  15. Fredriksson, K.: Engineering efficient metric indexes. Pattern Recognition Letters 28(1), 75–84 (2007)

    Article  Google Scholar 

  16. Fulgham, B.: The computer language benchmarks game (2013), http://benchmarksgame.alioth.debian.org/ (last accessed on May 14, 2013)

  17. Gonzalez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9), 1647–1658 (2008)

    Article  Google Scholar 

  18. Gonzalez, E.C., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(9), 1647–1658 (2008)

    Article  Google Scholar 

  19. Hedges, L.V., Vevea, J.L.: Fixed-and random-effects models in meta-analysis. Psychological Methods 3(4), 486–504 (1998)

    Article  Google Scholar 

  20. Hundt, R.: Loop recognition in C++/Java/Go/Scala. In: Proceedings of Scala Days 2011 (2011)

    Google Scholar 

  21. Indyk, P.: Nearest neighbors in high-dimensional spaces. In: Goodman, J.E., O’Rourke, J. (eds.) Handbook of Discrete and Computational Geometry, pp. 877–892. Chapman and Hall/CRC (2004)

    Google Scholar 

  22. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC 1998, pp. 604–613. ACM, New York (1998)

    Chapter  Google Scholar 

  23. Jacobs, D., Weinshall, D., Gdalyahu, Y.: Classification with nonmetric distances: Image retrieval and class representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(6), 583–600 (2000)

    Article  Google Scholar 

  24. King, G.: How not to lie with statistics: Avoiding common mistakes in quantitative political science. American Journal of Political Science, 666–687 (1986)

    Google Scholar 

  25. King, R.S.: The top 10 programming languages (the data). IEEE Spectrum 48(10), 84–84 (2011)

    Article  Google Scholar 

  26. Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, STOC 1998, pp. 614–623. ACM, New York (1998)

    Google Scholar 

  27. Lokoč, J., Hetland, M.L., Skopal, T., Beecks, C.: Ptolemaic indexing of the signature quadratic form distance. In: Proceedings of the Fourth International Conference on SImilarity Search and APplications, SISAP 2011, pp. 9–16. ACM, New York (2011)

    Google Scholar 

  28. Mu, Y., Yan, S.: Non-metric locality-sensitive hashing. In: AAAI (2010)

    Google Scholar 

  29. Novak, D., Kyselak, M., Zezula, P.: On locality-sensitive indexing in generic metric spaces. In: Proceedings of the Third International Conference on SImilarity Search and APplications, SISAP 2010, pp. 59–66. ACM, New York (2010)

    Chapter  Google Scholar 

  30. Parri, J., Shapiro, D., Bolic, M., Groza, V.: Returning control to the programmer: Simd intrinsics for virtual machines. Commun. ACM 54(4), 38–43 (2011)

    Article  Google Scholar 

  31. Pestov, V.: Indexability, concentration, and VC theory. Journal of Discrete Algorithms 13, 2–18 (2012); Best Papers from the 3rd International Conference on Similarity Search and Applications (SISAP 2010)

    Google Scholar 

  32. Pestov, V.: Is the k-NN classifier in high dimensions affected by the curse of dimensionality? Computers & Mathematics with Applications (2012)

    Google Scholar 

  33. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc. (2005)

    Google Scholar 

  34. Scott, D.W., Thompson, J.R.: Probability density estimation in higher dimensions. Technical Report, Rice University, Texas Huston (1983)

    Google Scholar 

  35. Shafi, A., Carpenter, B., Baker, M., Hussain, A.: A comparative study of java and c performance in two large-scale parallel applications. Concurrency and Computation: Practice and Experience 21(15), 1882–1906 (2009)

    Article  Google Scholar 

  36. Skopal, T.: Unified framework for fast exact and approximate search in dissimilarity spaces. ACM Trans. Database Syst. 32(4) (November 2007)

    Google Scholar 

  37. Skopal, T., Bustos, B.: On nonmetric similarity search problems in complex domains. ACM Comput. Surv. 43(4), 34:1–34:50 (October 2011)

    Google Scholar 

  38. Uhlmann, J.: Satisfying general proximity similarity queries with metric trees. Information Processing Letters 40, 175–179 (1991)

    Article  Google Scholar 

  39. Vivanco, R.A., Pizzi, N.J.: Scientific computing with Java and C++: a case study using functional magnetic resonance neuroimages. Software: Practice and Experience 35(3), 237–254 (2005)

    Article  Google Scholar 

  40. Weber, R., Schek, H.J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the 24th International Conference on Very Large Data Bases, pp. 194–205. Morgan Kaufmann (August 1998)

    Google Scholar 

  41. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag New York, Inc., Secaucus (2005)

    Google Scholar 

  42. Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate similarity retrieval with m-trees. The VLDB Journal 7(4), 275–293 (1998)

    Article  Google Scholar 

  43. Zhang, Z., Ooi, B.C., Parthasarathy, S., Tung, A.K.H.: Similarity search on bregman divergence: towards non-metric indexing. Proc. VLDB Endow. 2(1), 13–24 (2009)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boytsov, L., Naidan, B. (2013). Engineering Efficient and Effective Non-metric Space Library. In: Brisaboa, N., Pedreira, O., Zezula, P. (eds) Similarity Search and Applications. SISAP 2013. Lecture Notes in Computer Science, vol 8199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41062-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41062-8_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41061-1

  • Online ISBN: 978-3-642-41062-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics