On the Privacy Guarantees of Synthetic Data: A Reassessment from the Maximum-Knowledge Attacker Perspective

Ruiz, Nicolas; Muralidhar, Krishnamurty; Domingo-Ferrer, Josep

doi:10.1007/978-3-319-99771-1_5

Nicolas Ruiz¹⁵,
Krishnamurty Muralidhar¹⁶ &
Josep Domingo-Ferrer¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11126))

Included in the following conference series:

International Conference on Privacy in Statistical Databases

1015 Accesses
18 Citations

Abstract

Generating synthetic data for the dissemination of individual information in a privacy-preserving way is an approach that is often presented as superior to other statistical disclosure control techniques. The reason for such claim is straightforward at first glance: since all records disseminated are synthetic and not actual observed values, no individual can reasonably claim to face a privacy threat. Thus, and if the synthesizer used is good enough, synthetic data will potentially always offer a high level of information with low disclosure risk attached. Building on recent advances in the literature regarding the conceptualization of an intruder, this paper aims at challenging this claim by reassessing the privacy guarantees of synthetic data. Using the concept of a maximum-knowledge intruder, we demonstrate that synthetic data can in fact be always expressed as a re-arrangement of the original data and that, as a result, they may lead to configurations where disclosure risk may be higher than for non-synthetic disclosure control approaches. We illustrate the application of these results by an empirical example.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Using these notations, \( o_{ij} \) is the rank of attribute j in original record i and \( s_{lj}^{m} \) is the rank of attribute j in synthetic record l of the m^th synthetic data set.
2.
The two other attributes are not shown here due to space constraints but their reverse-mapped versions can be displayed in exactly the same way.

References

Domingo-Ferrer, J., Muralidhar, K.: New directions in anonymization: permutation paradigm, verifiability by subjects and intruders, transparency to users. Inf. Sci. 337, 11–24 (2016)
Article Google Scholar
Domingo-Ferrer, J., Ricci, S., Soria-Comas, J.: Disclosure risk assessment via record linkage by a maximum-knowledge attacker. In: 13th Annual International Conference on Privacy, Security and Trust-PST 2015, Izmir, Turkey, September 2015
Google Scholar
Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of nominal data based on semantic marginality. Inf. Sci. 242, 35–48 (2013)
Article Google Scholar
Drechsler, J.: Synthetic Datasets for Statistical Disclosure Control. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-0326-5
Book MATH Google Scholar
Drechsler, J., Bender, S., Rässler, S.: Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB establishment panel. Trans. Data Priv. 1, 105–130 (2008)
MathSciNet Google Scholar
Hu, J., Reiter, J.P., Wang, Q.: Disclosure risk evaluation for fully synthetic categorical data. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 185–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11257-2_15
Chapter Google Scholar
Hundepool, A., et al.: Statistical Disclosure Control. Wiley, Hoboken (2012)
Book Google Scholar
Muralidhar, K., Domingo-Ferrer, J.: Rank-based record linkage for re-identification risk assessment. In: Domingo-Ferrer, J., Pejić-Bach, M. (eds.) PSD 2016. LNCS, vol. 9867, pp. 225–236. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45381-1_17
Chapter Google Scholar
Muralidhar, K., Domingo-Ferrer, J.: Microdata masking as permutation. In: UNECE/EUROSTAT Work Session on Statistical Data Confidentiality, Helsinki, Finland, October 2015
Google Scholar
Muralidhar, K., Sarathy, R.: A comparison of multiple imputation and data perturbation for masking numerical variables. J. Off. Stat. 22, 507–524 (2006)
Google Scholar
Muralidhar, K., Sarathy, R., Domingo-Ferrer, J.: Reverse mapping to preserve the marginal distributions of attributes in masked microdata. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 105–116. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11257-2_9
Chapter Google Scholar
Reiter, J.P., Wang, Q., Zhang, B.: Bayesian estimation of disclosure risks in multiply imputed, synthetic data. J. Priv. Confid. 6(1), 17–33 (2014). Article no. 2
Google Scholar
Reiter, J.P.: Satisfying disclosure restrictions with synthetic data sets. J. Off. Stat. 18, 531–544 (2002)
Google Scholar
Reiter, J.P.: Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study. J. Roy. Stat. Soc. Ser. A 168, 185–205 (2005)
Article MathSciNet Google Scholar
Rubin, D.B.: Discussion: statistical disclosure control limitation. J. Off. Stat. 9, 462–468 (1993)
Google Scholar
Ruiz, N.: On some consequences of the permutation paradigm for data anonymization: centrality of permutation matrices, universal measures of disclosure risk and information loss, evaluation by dominance. Inf. Sci. 430–431, 620–633 (2018)
Article MathSciNet Google Scholar
Ruiz, N.: A general cipher for individual data anonymization. Inf. Sci. (2017, under review). (https://arxiv.org/abs/1712.02557)
Soria-Comas, J., Domingo-Ferrer, J.: A non-parametric model for accurate and provably private synthetic data sets. In: Proceedings of International Conference on Availability, Reliability and Security-ARES 2017, Article no. 3. ACM (2017)
Google Scholar
Willenborg, L., De Waal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001). https://doi.org/10.1007/978-1-4613-0121-9
Book MATH Google Scholar

Download references

Acknowledgments and Disclaimer

The following funding sources are gratefully acknowledged by the third author: European Commission (project H2020-700540 “CANVAS”), Government of Catalonia (ICREA Acadèmia Prize) and Spanish Government (projects TIN2014-57364-C2-1-R “SmartGlacis” and TIN2015-70054-REDC). The views in this paper are the authors’ own and do not necessarily reflect the views of UNESCO or any of the funders.

Author information

Authors and Affiliations

UNESCO Chair in Data Privacy, Department of Computer Science and Mathematics, CYBERCAT-Center for Cybersecurity Research of Catalonia, Universitat Rovira i Virgili, Av. Països Catalans 26, 43007, Tarragona, Catalonia, Spain
Nicolas Ruiz & Josep Domingo-Ferrer
Department of Marketing and Supply Chain Management, Price College of Business, University of Oklahoma, 308 Brooks Street, Norman, OK, 73019, USA
Krishnamurty Muralidhar

Authors

Nicolas Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Krishnamurty Muralidhar
View author publications
You can also search for this author in PubMed Google Scholar
Josep Domingo-Ferrer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Ruiz .

Editor information

Editors and Affiliations

Rovira i Virgili University, Tarragona, Spain
Josep Domingo-Ferrer
University of Valencia, Burjassot, Spain
Francisco Montes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ruiz, N., Muralidhar, K., Domingo-Ferrer, J. (2018). On the Privacy Guarantees of Synthetic Data: A Reassessment from the Maximum-Knowledge Attacker Perspective. In: Domingo-Ferrer, J., Montes, F. (eds) Privacy in Statistical Databases. PSD 2018. Lecture Notes in Computer Science(), vol 11126. Springer, Cham. https://doi.org/10.1007/978-3-319-99771-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-99771-1_5
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99770-4
Online ISBN: 978-3-319-99771-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics