Abstract
In the release of microdata files, reidentification of a record implies disclosure of the values of a possibly large set of sensitive variables. When microdata files are released by statistical Agencies, a careful assessment of the associated disclosure risk is therefore required.
In order for an informed decision to be made, maximising accuracy and precision of the risk estimators is crucial. Clearly such characteristics will affect the risk assessment process and Agencies should choose the estimator that performs best. In fact, estimators may perform poorly, especially for those records whose real risk is higher. To improve estimation, we propose to introduce external information, arising from a previous census as is done in the context of small area estimation (see [10]). In [4] we considered SPREE - type estimators that use the association structure observed at a previous census (see [9]); in this paper we consider models that use the structure of a population contingency table while allowing for smooth variation of the latter. To assess the statistical properties of this estimator and compare it with alternative approaches, we show results of a simulation study that is based on a complex sampling scheme, typical of most households surveys in Italy. Comparison is made with a simple SPREE estimator and a Skinner-type estimator [13,6], applied to a complex sampling scheme.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chen, G., Keller-McNulty, S.: Estimation of identification disclosure risk in microdata. Journal of Official Statistics 14, 79–95 (1998)
EURAREA Consortium. Project Reference vol. 1 (2004), https://www.statistics.gov.uk/eurarea
Deville, J.C., Särndal, C.E.: Calibration estimators in survey sampling. Journal of the American Statistical Association 87, 367–382 (1992)
Di Consiglio, L., Polettini, S.: Improving individual risk estimators. In: Domingo-Ferrer, J., Franconi, L. (eds.) PSD 2006. LNCS, vol. 4302, pp. 243–256. Springer, Heidelberg (2006)
Dykstra, R.L.: An iterative procedure for obtaining i-projections onto the intersection of convex sets. The Annals of Probability 13, 975–984 (1985)
Elamir, E.A.H., Skinner, C.J.: Record level measures of disclosure risk for survey microdata. Journal of Official Statistics 22(3), 525–539 (2006)
Fienberg, S.E., Makov, U.E.: Confidentiality, uniqueness, and disclosure limitation for categorical data. Journal of Official Statistics 14, 385–397 (1998)
Forster, J.J., Webb, E.L.: Bayesian disclosure risk assessment: predicting small frequencies in contingency tables. Journal of the Royal Statistical Society: Series C 56(5), 551–570 (2007)
Purcell, N.J., Kish, L.: Postcensal estimates for local areas (small domains). International Statistical Review 48, 3–18 (1980)
Rao, J.N.K.: Small area estimation. John Wiley & Sons, Hoboken (2003)
Rinott, Y., Shlomo, N.: Variances and confidence intervals for sample disclosure risk measures. In: Proceedings of the 56th Session of the ISI, Lisbon, August 22-29, 2007 (2007)
Skinner, C.J., Elliot, M.J.: A measure of disclosure risk for microdata. Journal of the Royal Statistical Society, Series B 64, 855–867 (2002)
Skinner, C.J., Holmes, D.J.: Estimating the re-identification risk per record in microdata. Journal of Official Statistics 14, 361–372 (1998)
Skinner, C.J., Shlomo, N.: Assessing identification risk in survey micro-data using log linear models. Technical Report 14, S3RI Methodology Working Papers Series (2006), http://eprints.soton.ac.uk/41842/01/s3ri-workingpaper-m06-14.pdf
Zhang, L., Chambers, R.L.: Small area estimates for cross-classifications. Journal of the Royal Statistical Society, Series B 66(2), 479–496 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Di Consiglio, L., Polettini, S. (2008). Use of Auxiliary Information in Risk Estimation. In: Domingo-Ferrer, J., Saygın, Y. (eds) Privacy in Statistical Databases. PSD 2008. Lecture Notes in Computer Science, vol 5262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87471-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-87471-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87470-6
Online ISBN: 978-3-540-87471-3
eBook Packages: Computer ScienceComputer Science (R0)