Skip to main content

Some Branches May Bear Rotten Fruits: Diversity Browsing VP-Trees

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12440))

Included in the following conference series:

  • 803 Accesses

Abstract

Diversified similarity searching embeds result diversification straight into the query procedure, which boosts the computational performance by orders of magnitude. While metric indexes have a hidden potential for perfecting such procedures, the construction of a suitable, fast, and incremental solution for diversified similarity searching is still an open issue. This study presents a novel index-and-search algorithm, coined diversity browsing, that combines an optimized implementation of the vantage-point tree (VP-Tree) index with the distance browsing search strategy and coverage-based query criteria. Our proposal maps data elements into VP-Tree nodes, which are incrementally evaluated for solving diversified neighborhood searches. Such an evaluation is based not only on the distance between the query and candidate objects but also on distances from the candidate to data elements (called influencers) in the partial search result. Accordingly, we take advantage of those distance-based relationships for pruning VP-Tree branches that are themselves influenced by elements in the result set. As a result, diversity browsing benefits from data indexing for (i) eliminating nodes without valid candidate elements, and (ii) examining the minimum number of partitions regarding the query element. Experiments with real-world datasets show our approach outperformed competitors GMC and GNE by at least 4.91 orders of magnitude, as well as baseline algorithm BRID\(_k\) in at least \(87.51\%\) regarding elapsed query time.

M. Bedo—This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) - Finance Code 001 and Research Support Foundation of Rio de Janeiro State - G. E-26/010.101237/2018.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C.: Data Mining: The Textbook. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8

    Book  MATH  Google Scholar 

  2. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: ACM WSDM, pp. 5–14 (2009)

    Google Scholar 

  3. Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.: Searching in metric spaces. CSUR 33(3), 273–321 (2001)

    Article  Google Scholar 

  4. Chen, L., Gao, Y., Song, X., Li, Z., Miao, X., Jensen, C.: Indexing metric spaces for exact similarity search. arXiv preprint arXiv:2005.03468 (2020)

  5. Chen, L., Gao, Y., Zheng, B., Jensen, C., Yang, H., Yang, K.: Pivot-based metric indexing. PVLDB 10(10), 1058–1069 (2017)

    Google Scholar 

  6. Costa, V., Santos, R., Maconald, C., Ounis, I.: Sparse spatial selection for novelty-based search result diversification. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 344–355. Springer, Cham (2011). https://doi.org/10.1007/978-3-642-24583-1_34

    Chapter  Google Scholar 

  7. Drosou, M., Jagadish, H., Pitoura, E., Stoyanovich, J.: Diversity in big data: a review. Big Data 5(2), 73–84 (2017)

    Article  Google Scholar 

  8. Drosou, M., Pitoura, E.: Multiple radii disc diversity: result diversification based on dissimilarity and coverage. ACM TODS 40(1), 1–43 (2015)

    Article  MathSciNet  Google Scholar 

  9. Hetland, M.: The basic principles of metric indexing. In: Coello, C.A.C., Dehuri, S., Ghosh, S. (eds.) Swarm Intelligence for Multi-objective Problems in Data Mining. LNCS, vol. 242, pp. 199–232. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03625-5_9

    Chapter  Google Scholar 

  10. Hjaltason, G., Samet, H.: Index-driven similarity search in metric spaces. TODS 28(4), 517–580 (2003)

    Article  Google Scholar 

  11. Novak, D., Zezula, P.: PPP-codes for large-scale similarity searching. In: Hameurlain, A., Küng, J., Wagner, R., Decker, H., Lhotska, L., Link, S. (eds.) Transaction on Large-Scale Data-and Knowledge-Centered System. LNCS, vol. 9510, pp. 61–87. Springer, Cham (2016). https://doi.org/10.1007/978-3-662-49214-7_2

    Chapter  Google Scholar 

  12. Padmanabhan, D., Deshpande, P.: Operators for Similarity Search - Semantics, Techniques and Usage Scenarios. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21257-9

    Book  Google Scholar 

  13. Pestov, V.: Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions. Algorithmica 66(2), 310–328 (2013)

    Article  MathSciNet  Google Scholar 

  14. Pisinger, D.: Upper bounds and exact algorithms for p-dispersion problems. Comput. Oper. Res. 33(5), 1380–1398 (2006)

    Article  Google Scholar 

  15. Santos, L., Blanco, G., Oliveira, D., Traina, A., Traina Jr., C., Bedo, M.: Exploring diversified similarity with kundaha. In: ACM CIKM, pp. 1903–1906 (2018)

    Google Scholar 

  16. Santos, L., Oliveira, W., Ferreira, M., Traina, A., Traina Jr., C.: Parameter-free and domain-independent similarity search with diversity. In: SSDBM, pp. 1–12 (2013)

    Google Scholar 

  17. Traina Jr., C., Santos, R., Traina, A., Vieira, M., Faloutsos, C.: The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient. VLDB J. 16(4), 483–505 (2007)

    Article  Google Scholar 

  18. Vieira, M., et al.: On query result diversification. In: IEEE ICDE, pp. 1163–1174. IEEE (2011)

    Google Scholar 

  19. Yianilos, P.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: ACM-SIAM SDA, pp. 311–321. SIAM (1993)

    Google Scholar 

  20. Zheng, K., Wang, H., Qi, Z., Li, J., Gao, H.: A survey of query result diversification. Knowl. Inf. Syst. 51(1), 1–36 (2016). https://doi.org/10.1007/s10115-016-0990-4

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcos Bedo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jasbick, D., Santos, L., de Oliveira, D., Bedo, M. (2020). Some Branches May Bear Rotten Fruits: Diversity Browsing VP-Trees. In: Satoh, S., et al. Similarity Search and Applications. SISAP 2020. Lecture Notes in Computer Science(), vol 12440. Springer, Cham. https://doi.org/10.1007/978-3-030-60936-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60936-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60935-1

  • Online ISBN: 978-3-030-60936-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics