Abstract
Privacy Preserving Data Mining (PPDM) is an application field, which is becoming very relevant. Its goal is the study of new mechanisms which allow the dissemination of confidential data for data mining tasks while preserving individual private information. Additionally, due to the relevance of \(R\) language in the statistics and data mining communities, it is undoubtedly a good environment to research, develop and test privacy techniques aimed to data mining. In this chapter we outline some helpful tools in \(R\) to introduce readers to that field, so that we present several PPDM protection techniques as well as their information loss and disclosure risk evaluation process and outline some tools in \(R\) to help to introduce practitioners to this field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abril, D., Navarro-Arribas, G., Torra, V.: Supervised learning using mahalanobis distance for record linkage. In: Proceedings of 6th International Summer School on Aggregation Operators—AGOP2011. pp. 223–228 (2011)
Abril, D., Navarro-Arribas, G., Torra, V.: Improving record linkage with supervised learning for disclosure risk assessment. Inf. Fusion 13(4), 274–284 (2012)
Abril, D., Navarro-Arribas, G., Torra, V.: Choquet integral for record linkage. Ann. Oper. Res. 195, 97–110 (2012)
Abril, D., Navarro-Arribas, G., Torra, V.: Towards a private vector space model for confidential documents. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing. pp. 944–945. SAC ’13, ACM, New York, NY, USA (2013) http://doi.acm.org/10.1145/2480362.2480543
Agafitei, M., Defays, D.: Analysis of information loss in european data due to confidentiality. In: Joint UNECE/Eurostat work session on statistical data confidentiality (2011)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data. pp. 439–450. ACM Press (2000)
Brand, R.: Microdata protection through noise addition. In: Inference Control in Statistical Databases, from Theory to Practice. pp. 97–116. No. 2316 in Lecture Notes in Computer Science, Springer-Verlag (2002)
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys. pp. 195–204. Statistics Canada (1993)
Domingo-Ferrer, J., Mateo-Sanz, J.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14, 189–201 (2002)
Domingo-Ferrer, J., Rebollo-Monedero, D.: Measuring risk and utility of anonymized data using information theory. In: Privacy and Anonymity in the Information Society (PAIS’09), Proceedings of the 2009 EDBT/ICDT Workshops (EDBT/ICDT ’09). pp. 126–130. ACM (2009)
Domingo-Ferrer, J., Sebé, F., Castellà-Roca, J.: On the security of noise addition for privacy in statistical databases. In: Privacy in Statistical Databases. Lecture Notes In Computer Science, vol. 3050, pp. 149–161 (2004)
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, disclosure, and data access : theory and practical applications for statistical agencies, pp. 111–133. Elsevier (2001)
Domingo-Ferrer, J., Torra, V.: Ordinal, continous and heterogeneous anonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)
Hornik, K., Theussl, S.: Rglpk: R/GNU Linear Programming Kit Interface (2012), http://CRAN.R-project.org/package=Rglpk, R package version 0.3-8
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985), http://dx.doi.org/10.1007/BF01908075
Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of tampa, florida. J. Am. Stat. Assoc. 84(406), 414–420 (1989)
lp\_solve, Konis, K.: lpSolveAPI: R Interface for lp\_solve version 5.5.2.0 (2011), http://CRAN.R-project.org/package=lpSolveAPI, R package version 5.5.2.0-5
Mateo-Sanz, J., Domingo-Ferrer, J., Sebé, F.: Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Min. Knowl. Discov. 11(2), 181–193 (2005)
Moore, R.: Controlled data swapping techniques for masking public use microdata sets. U.S. Bureau of the Census (unpublished manuscript) (1996)
Navarro-Arribas, G., Torra, V.: Privacy-preserving data-mining through microaggregation for web-based e-commerce. Internet Res. 20(3), 366–384 (2010)
Navarro-Arribas, G., Torra, V., Erola, A., Castellà -Roca, J.: User k-anonymity for privacy preserving data mining of query logs. Inf. Process. Manage. 48(3), 476–487 (2012)
Nin, J., Torra, V.: Towards the evaluation of time series protection methods. Inf. Sci. 179(11), 1663–1677 (2009)
Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. United Nat. Econ. Comm. Eur. 18, 345–354 (2001)
Pagliuca, D., Seri, G.: Some results of individual ranking method on the system of enterprise acounts annual survey. Esprit SDC Project, Delivrable MI-3/D2 (1999)
R Core Team: R data import/export (2012) http://cran.r-project.org/doc/manuals/R-data.pdf
Reiss, S.: Practical data-swapping: the first steps. In: IEEE Symposium on Security and Privacy. pp. 38–43 (1980)
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Sweeney, L.: Uniqueness of simple demographics in the U.S. population (2000)
Templ, M., Meindl, B.: Robust statistics meets sdc: New disclosure risk measures for continuous microdata masking. In: Proceedings of the UNESCO Chair in data privacy international conference on Privacy in Statistical Databases. pp. 177–189. Springer (2008)
Templ, M.: Statistical disclosure control for microdata using the r-package sdcmicro. Trans. Data Priv. 1(2), 67–85 (2008)
Torra, V.: Microaggregation for categorical variables: a median based approach. In: Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 3050, pp. 162–174 (2004)
Torra, V.: Constrained microaggregation: adding constraints for data editing. Trans. Data Priv. 1, 86–104 (2008)
Torra, V., Ladra, S.: Cluster-specific information loss measures in data privacy: A review. In: Third International Conference on Availability, Reliability and Security, 2008. ARES 08 (2008)
Torra, V., Navarro-Arribas, G.: Data privacy. WIREs Data Mining Knowl Discov (2014). doi:10.1002/widm.1129
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Springer, Berliin (2001) (Lecture Notes in Statistics)
Acknowledgments
Partial support by the Spanish MICINN (projects COPRIVACY (TIN2011-27076-C03-03), N-KHRONOUS (TIN2010-15764), and ARES (CONSOLIDER INGENIO 2010 CSD2007-00004)) and by the EC (FP7/2007-2013) Data without Boundaries (grant agreement number 262608) is acknowledged. The work contributed by the first author was carried out as part of the Computer Science Ph.D. program of the Universitat Autónoma de Barcelona (UAB).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Abril, D., Navarro-Arribas, G., Torra, V. (2015). Data Privacy with \(R\) . In: Navarro-Arribas, G., Torra, V. (eds) Advanced Research in Data Privacy. Studies in Computational Intelligence, vol 567. Springer, Cham. https://doi.org/10.1007/978-3-319-09885-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-09885-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09884-5
Online ISBN: 978-3-319-09885-2
eBook Packages: EngineeringEngineering (R0)