Skip to main content

A Tutorial on Blocking Methods for Privacy-Preserving Record Linkage

  • Conference paper
  • First Online:
Algorithmic Aspects of Cloud Computing (ALGOCLOUD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9511))

Included in the following conference series:

Abstract

In this paper, we first present five state-of-the-art private blocking methods which rely mainly on random strings, clustering, and public reference sets. We emphasize on the drawbacks of these methods, and then, we present our L-fold redundant blocking scheme, that relies on the Locality-Sensitive Hashing technique for identifying similar records. These records have undergone an anonymization transformation using a Bloom filter-based encoding technique. Finally, we perform an experimental evaluation of all these methods and present the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C., Yu, P.: The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space. In: SIGKDD, pp. 119–129 (2000)

    Google Scholar 

  2. Al-Lawati, A., Lee, D., McDaniel, P.: Blocking-aware private record linkage. In: IQIS, pp. 59–68 (2005)

    Google Scholar 

  3. Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, Heidelberg (2012)

    Google Scholar 

  4. Christen, P.: A survey of indexing techniques for scalable record linkage and deduplication. TKDE 24(9), 1537–1555 (2012)

    Google Scholar 

  5. Cohen, W., Richman, J.: Learning to match and cluster large high-dimensional datasets for data integration. In: SIGKDD, pp. 475–480 (2002)

    Google Scholar 

  6. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.: Locality-sensitive hashing scheme based on p-stable distributions. In: Symposium on Computational Geometry, pp. 253–262 (2004)

    Google Scholar 

  7. Durham, E.: A Framework For Accurate Efficient Private Record Linkage. Ph.D. thesis, Vanderbilt Univ., US (2012)

    Google Scholar 

  8. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB, pp. 518–529 (1999)

    Google Scholar 

  10. Goodman, J., O’Rourke, J., Indyk, P.: Handbook of Discrete and Computational Geometry. CRC, Boca Raton (2004)

    MATH  Google Scholar 

  11. Hall, R., Fienberg, S.E.: Privacy-preserving record linkage. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 269–283. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Hernandez, M., Stolfo, S.: Real world data is dirty: data cleansing and the merge/purge problem. DMKD 2(1), 9–37 (1988)

    Google Scholar 

  13. Inan, A., Kantarcioglou, M., Bertino, E., Scannapieco, M.: A hybrid approach to private record linkage. In: ICDE, pp. 496–505 (2008)

    Google Scholar 

  14. Inan, A., Kantarcioglu, M., Ghinita, G., Bertino, E.: Private record matching using differential privacy. In: EDBT, pp. 123–134 (2010)

    Google Scholar 

  15. Jin, L., Li, C., Mehrotra, S.: Efficient record linkage in large datasets. In: DASFAA, pp. 137–146 (2003)

    Google Scholar 

  16. Karakasidis, A., Verykios, V.: Privacy preserving record linkage using phonetic codes. In: BCI, pp. 101–106. IEEE (2009)

    Google Scholar 

  17. Karakasidis, A., Verykios, V.: A sorted neighborhood approach to multidimensional privacy preserving blocking. In: ICDM Workshops, pp. 937–944 (2012)

    Google Scholar 

  18. Karapiperis, D., Verykios, V.: A distributed near-optimal LSH-based framework for privacy-preserving record linkage. COMSIS 11(2), 745–763 (2014)

    Article  Google Scholar 

  19. Karapiperis, D., Verykios, V.: A distributed framework for scaling up LSH-based computations in privacy preserving record linkage. In: BCI, pp. 102–109. ACM (2013)

    Google Scholar 

  20. Karapiperis, D., Verykios, V.: An LSH-based blocking approach with a homomorphic matching technique for privacy-preserving record linkage. TKDE 27(4), 909–921 (2015)

    Google Scholar 

  21. Kim, H., Lee, D.: Fast iterative hashed record linkage for large-scale data collections. In: EDBT, pp. 525–536 (2010)

    Google Scholar 

  22. Kuzu, M., Kantarcioglu, M., Inan, A., Bertino, E., Durham, E., Malin, B.: Efficient privacy-aware record integration. In: EDBT, pp. 167–178 (2013)

    Google Scholar 

  23. NCVR: North Carolina voter registration database. ftp://www.app.sboe.state.nc.us/data

  24. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–238. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  25. Rivest, R.: Chaffing and winnowing: Confidentiality without encryption. MIT Internal paper (2011)

    Google Scholar 

  26. Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.: Privacy preserving schema and data matching. In: SIGMOD, pp. 653–664 (2007)

    Google Scholar 

  27. Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Med. Inf. Decis. Mak. 9(41), 1–11 (2009)

    Google Scholar 

  28. Sweeney, L.: k-anonymity: a model for protecting privacy. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  29. Vatsalan, D., Christen, P., Verykios, V.: An efficient two-party protocol for approximate matching in private record linkage. In: AUSDM, pp. 125–136 (2011)

    Google Scholar 

  30. Vatsalan, D., Christen, P., Verykios, V.: Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: CIKM, pp. 1949–1958 (2013)

    Google Scholar 

  31. Vatsalan, D., Christen, P., Verykios, V.: A taxonomy of privacy-preserving record linkage techniques. Inf. Sys. 38(6), 946–969 (2013)

    Article  Google Scholar 

  32. Weber, R., Schek, H., Blott, S.: A quantitative analysis and performance study for similarity search methods in high dimensional spaces. In: VLDB, pp. 194–205 (1998)

    Google Scholar 

  33. Yakout, M., Atallah, M., Elmagarmid, A.: Efficient private record linkage. In: ICDE, pp. 1283–1286 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitrios Karapiperis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Karapiperis, D., Verykios, V.S., Katsiri, E., Delis, A. (2016). A Tutorial on Blocking Methods for Privacy-Preserving Record Linkage. In: Karydis, I., Sioutas, S., Triantafillou, P., Tsoumakos, D. (eds) Algorithmic Aspects of Cloud Computing. ALGOCLOUD 2015. Lecture Notes in Computer Science(), vol 9511. Springer, Cham. https://doi.org/10.1007/978-3-319-29919-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29919-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29918-1

  • Online ISBN: 978-3-319-29919-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics