Accounting for Intruder Uncertainty Due to Sampling When Estimating Identification Disclosure Risks in Partially Synthetic Data

Drechsler, Jörg; Reiter, Jerome P.

doi:10.1007/978-3-540-87471-3_19

Jörg Drechsler¹ &
Jerome P. Reiter²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5262))

Included in the following conference series:

International Conference on Privacy in Statistical Databases

1083 Accesses
15 Citations

Abstract

Partially synthetic data comprise the units originally surveyed with some collected values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple draws from statistical models. Because the original records remain on the file, intruders may be able to link those records to external databases, even though values are synthesized. We illustrate how statistical agencies can evaluate the risks of identification disclosures before releasing such data. We compute risk measures when intruders know who is in the sample and when the intruders do not know who is in the sample. We use classification and regression trees to synthesize data from the U.S. Current Population Survey.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Little, R.J.A.: Statistical analysis of masked data. J. Off. Stat. 9, 407–426 (1993)
Google Scholar
Reiter, J.P.: Inference for partially synthetic, public use microdata sets. Surv. Methodol. 29, 181–189 (2003)
Google Scholar
Kennickell, A.B.: Multiple imputation and disclosure protection: the case of the 1995 Survey of Consumer Finances. In: Record Linkage Techniques, pp. 248–267. National Academy Press, Washington (1997)
Google Scholar
Abowd, J.M., Stinson, M., Benedetto, G.: Final report to the Social Security Administration on the SIPP/SSA/IRS public use file project. Technical report, U.S. Census Bureau Longitudinal Employer-Household Dynamics Program (2006)
Google Scholar
Abowd, J.M., Woodcock, S.D.: Disclosure limitation in longitudinal linked data. In: Doyle, P., Lane, J., Zayatz, L., Theeuwes, J. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 215–277. North-Holland, Amsterdam (2001)
Google Scholar
Abowd, J.M., Woodcock, S.D.: Multiply-imputing confidential characteristics and file links in longitudinal linked data. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 290–297. Springer, Heidelberg (2004)
Google Scholar
Reiter, J.P.: Simultaneous use of multiple imputation for missing data and disclosure limitation. Surv. Methodol. 30, 235–242 (2004)
Google Scholar
Little, R.J.A., Liu, F., Raghunathan, T.E.: Statistical disclosure techniques based on multiple imputation. In: Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, pp. 141–152. John Wiley & Sons, New York (2004)
Chapter Google Scholar
Mitra, R., Reiter, J.P.: Adjusting survey weights when altering identifying design variables via synthetic data. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 177–188. Springer, Heidelberg (2006)
Chapter Google Scholar
Drechsler, J., Bender, S., Rässler, S.: Comparing fully and partially synthetic datasets for statistical disclosure control in the German IAB Establishment Panel. Joint Eurostat UNECE Worksession on Statistical Data Confidentiality, Manchester, WP. 11 (2007)
Google Scholar
Raghunathan, T.E., Lepkowski, J.M., van Hoewyk, J., Solenberger, P.: A multivariate technique for multiply imputing missing values using a series of regression models. Surv. Methodol. 27, 85–96 (2001)
Google Scholar
Reiter, J.P.: Significance tests for multi-component estimands from multiply-imputed, synthetic microdata. J. Stat. Plan. Inf. 131, 365–377 (2005)
Article MATH MathSciNet Google Scholar
Reiter, J.P., Mitra, R.: Estimating risks of identification disclosure in partially synthetic data. J. Priv. Conf. (to appear)
Google Scholar
Duncan, G.T., Lambert, D.: The Risk of disclosure for microdata. Journal of Business and Economic Statistics 7, 207–217 (1989)
Article Google Scholar
Fienberg, S.E., Makov, U.E., Sanil, A.P.: A Bayesian approach to data disclosure: Optimal intruder behavior for continuous data. J. Off. Stat. 13, 75–89 (1997)
Google Scholar
Reiter, J.P.: Estimating identification risks in microdata. J. Amer. Stat. Assoc. 100, 1103–1113 (2005)
Article MATH MathSciNet Google Scholar
Elamir, E.A.H., Skinner, C.J.: Record level measures of disclosure risk for survey microdata. J. Off. Stat. 22, 525–529 (2006)
Google Scholar
Reiter, J.P.: Releasing multiply-imputed, synthetic public use microdata: An illustration and empirical study. J. Roy. Stat. Soc. A 168, 531–544 (2005)
MathSciNet Google Scholar
Reiter, J.P.: Using CART to generate partially synthetic, public use microdata. J. Off. Stat. 21, 441–462 (2005)
Google Scholar
Rubin, D.B.: The Bayesian bootstrap. Ann. Stat. 9, 130–134 (1981)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Employment Research, 90478, Nuremberg, Germany
Jörg Drechsler
Duke University, Durham, NC 27708, USA
Jerome P. Reiter

Authors

Jörg Drechsler
View author publications
You can also search for this author in PubMed Google Scholar
Jerome P. Reiter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Josep Domingo-Ferrer Yücel Saygın

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Drechsler, J., Reiter, J.P. (2008). Accounting for Intruder Uncertainty Due to Sampling When Estimating Identification Disclosure Risks in Partially Synthetic Data. In: Domingo-Ferrer, J., Saygın, Y. (eds) Privacy in Statistical Databases. PSD 2008. Lecture Notes in Computer Science, vol 5262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87471-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-87471-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87470-6
Online ISBN: 978-3-540-87471-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics