Skip to main content

Locality-Sensitive Bloom Filter for Approximate Membership Query

  • Chapter
  • First Online:
Searchable Storage in Cloud Computing

Abstract

In many network applications, Bloom filters are used to support exact-matching membership query for their randomized space-efficient data structure with a small probability of false answers. We extend the standard Bloom filter to Locality-Sensitive Bloom Filter (LSBF) to provide Approximate Membership Query (AMQ) service. We achieve this by replacing uniform and independent hash functions with locality-sensitive hash functions. Such replacement makes the storage in LSBF to be locality sensitive. Meanwhile, LSBF is space efficient and query responsive by employing the Bloom filter design. In the design of the LSBF structure, we propose a bit vector to reduce False Positives (FP). The bit vector can verify multiple attributes belonging to one member. We also use an active overflowed scheme to significantly decrease False Negatives (FN). Rigorous theoretical analysis (e.g., on FP, FN, and space overhead) shows that the design of LSBF is space compact and can provide accurate response to approximate membership queries. We have implemented LSBF in a real distributed system to perform extensive experiments using real-world traces. Experimental results show that LSBF, compared with a baseline approach and other state-of-the-art work in the literature (SmartStore and LSB-tree), takes less time to respond to AMQ and consumes much less storage space (\(\copyright \){2012}IEEE. Reprinted, with permission, from Ref. [1].).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Y. Hua, B. Xiao, B. Veeravalli, D. Feng, Locality-sensitive bloom filter for approximate membership query. IEEE Trans. Comput. (TC) 61(6), 817–830 (2012)

    Article  MathSciNet  Google Scholar 

  2. L. Carter, R. Floyd, J. Gill, G. Markowsky, and M. Wegman, Exact and approximate membership testers, in Proceedings of STOC (1978), pp. 59–65

    Google Scholar 

  3. Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li, Multi-probe lsh: efficient indexing for high-dimensional similarity search, in Proceedings of VLDB (2007), pp. 950–961

    Google Scholar 

  4. F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, G. Varghese, Beyond bloom filters: from approximate membership checks to approximate state machines, in Proceedings of ACM SIGCOMM (2006)

    Google Scholar 

  5. Y. Zhu, H. Jiang, False rate analysis of Bloom filter replicas in distributed systems, in Proceedings of ICPP (2006), pp. 255–262

    Google Scholar 

  6. W. Feng, D.D. Kandlur, D. Saha, K.G. Shin, Stochastic fair blue: a queue management algorithm for enforcing fairness, in Proceedings of INFOCOM (2001)

    Google Scholar 

  7. F.M. Cuenca-Acuna, C.Peery, R.P. Martin, T.D. Nguyen, PlantP: using gossiping to build content addressable peer-to-peer information sharing communities, in IEEE HPDC (2003)

    Google Scholar 

  8. A. Pagh, R. Pagh, S. Rao, An optimal bloom filter replacement, in Proceedings of SODA (2005), pp. 823–829

    Google Scholar 

  9. S. Dharmapurikar, P. Krishnamurthy, D.E. Taylor, Longest prefix matching using bloom filters, in Proceedings of ACM SIGCOMM (2003), pp. 201–212

    Google Scholar 

  10. A. Broder, M. Mitzenmacher, Using multiple hash functions to improve IP lookups, inProceedings of INFOCOM (2001), pp. 1454–1463

    Google Scholar 

  11. F. Baboescu, G. Varghese, Scalable packet classification. IEEE/ACM Trans. Netw. 13(1), 2–14 (2005)

    Article  Google Scholar 

  12. P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of STOC (1998), pp. 604–613

    Google Scholar 

  13. A. Kirsch, M. Mitzenmacher, Distance-sensitive bloom filters, in Proceedings of Algorithm Engineering and Experiments (ALENEX) (2006)

    Google Scholar 

  14. A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 1, 117–122 (2008)

    Article  Google Scholar 

  15. L. Fan, P. Cao, J. Almeida, A. Broder, Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8(3), 281–293 (2000)

    Article  Google Scholar 

  16. M. Mitzenmacher, Compressed bloom filters. IEEE/ACM Trans. Netw. 10(5), 604–612 (2002)

    Article  Google Scholar 

  17. Y. Hua, Y. Zhu, H. Jiang, D. Feng, L. Tian, Scalable and adaptive metadata management in ultra large-scale file systems, in Proceedings of ICDCS (2008), pp. 403–410

    Google Scholar 

  18. A. Kumar, J.J. Xu, J. Wang, O. Spatschek, L.E. Li, Space-code bloom filter for efficient per-flow traffic measurement, in Proceedings of INFOCOM (2004), pp. 1762–1773

    Google Scholar 

  19. C. Saar, M. Yossi, Spectral bloom filters, Proceedings of ACM SIGMOD (2003), pp. 241–252

    Google Scholar 

  20. D. Guo, J. Wu, H. Chen, X. Luo, Theory and network application of dynamic bloom filters, in Proceedings of INFOCOM (2006)

    Google Scholar 

  21. B. Xiao, Y. Hua, Using parallel bloom filters for multi-attribute representation on network services. IEEE Trans. Parallel Distrib. Syst. 1, 20–32 (2010)

    Article  Google Scholar 

  22. H. Song, F. Hao, M. Kodialam, T.V. Lakshman, IPv6 lookups using distributed and load balanced bloom filters for 100Gbps core router line cards, in INFOCOM (2009)

    Google Scholar 

  23. F. Hao, M. Kodialam, T.V. Lakshman, H. Song, Fast multiset membership testing using combinatorial bloom filters, in Proceedings of INFOCOM (2009)

    Google Scholar 

  24. F. Hao, M. Kodialam, T.V. Lakshman, Incremental bloom filters, in Proceedings of INFOCOM (2008), pp. 1741–1749

    Google Scholar 

  25. A. Broder, M. Mitzenmacher, Network applications of bloom filters: a survey. Internet Math. 1, 485–509 (2005)

    Article  MathSciNet  Google Scholar 

  26. A. Joly, O. Buisson, A posteriori multi-probe locality sensitive hashing, in Proceedings of ACM Multimedia (2008)

    Google Scholar 

  27. Y. Hua, B. Xiao, D. Feng, B. Yu, Bounded LSH for similarity search in peer-to-peer file systems, in Proceedings of ICPP (2008), pp. 644–651

    Google Scholar 

  28. M. Datar, N. Immorlica, P. Indyk, V. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, in Proceedings of the Annual Symposium on Computational Geometry (2004), pp. 253–262

    Google Scholar 

  29. A. Andoni, M. Datar, N. Immorlica, P. Indyk, V. Mirrokni, Locality-sensitive hashing using stable distributions, in Nearest Neighbor Methods in Learning and Vision: Theory and Practice, ed. by T. Darrell, P. Indyk, G. Shakhnarovich (MIT Press, 2006)

    Google Scholar 

  30. M. Charikar, Similarity estimation techniques from rounding algorithms, in Proceedings of STOC (2002), pp. 380–388

    Google Scholar 

  31. N. Agrawal, W. Bolosky, J. Douceur, J. Lorch, A five-year study of file-system metadata, in Proceedings of FAST (2007)

    Article  Google Scholar 

  32. The Forest CoverType dataset, UCI machine learning repository, http://archive.ics.uci.edu/ml/datasets/Covertype

    Google Scholar 

  33. Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Tian, SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems, in Proceedings of ACM/IEEE Supercomputing Conference (SC) (2009)

    Google Scholar 

  34. Y. Tao, K. Yi, C. Sheng, P. Kalnis, Quality and efficiency in high-dimensional nearest neighbor search, in Proceedings of SIGMOD (2009)

    Google Scholar 

  35. A. Guttman, R-trees: a dynamic index structure for spatial searching, in Proceedings of ACM SIGMOD (1984), pp. 47–57

    Article  Google Scholar 

  36. A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing, in textitVLDB (1999), pp. 518–529

    Google Scholar 

  37. A. Leung, I. Adams, E.L. Miller, Magellan: a searchable metadata architecture for large-scale file systems, in University of California, Santa Cruz, UCSC-SSRC-09-07 (2009)

    Google Scholar 

  38. V. Athitsos, M. Potamias, P. Papapetrou, G. Kollios, Nearest neighbor retrieval using distance-based hashing, in Proceedings of ICDE (2008)

    Google Scholar 

  39. Y. Hua, Y. Zhu, H. Jiang, D. Feng, L. Tian, Supporting scalable and adaptive metadata management in ultra large-scale file systems. IEEE Trans. Parallel Distrib. Syst. (TPDS) 22(4), 580–593 (2011)

    Article  Google Scholar 

  40. J. Bruck, J. Gao, A. Jiang, Weighted bloom filter, in, Proceedings of the 2006 IEEE International Symposium on Information Theory (ISIT 2006) (2006), pp. 2304–2308

    Google Scholar 

  41. M. Zhong, P. Lu, K. Shen, J. Seiferas, Optimizing data popularity conscious bloom filters, in PODC (2008)

    Google Scholar 

  42. F. Hao, M. Kodialam, T. Lakshman, Building high accuracy Bloom filters using partitioned hashing, in Proceedings of SIGMETRICS (2007), pp. 277–288

    Article  Google Scholar 

  43. B. Donnet, B. Baynat, T. Friedman, Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives, in Proceedings of ACM CoNEXT (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Hua .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hua, Y., Liu, X. (2019). Locality-Sensitive Bloom Filter for Approximate Membership Query. In: Searchable Storage in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-2721-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2721-6_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2720-9

  • Online ISBN: 978-981-13-2721-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics