Abstract
The General Data Protection Regulation came into effect on May 25, 2018, and has rapidly become a touchstone model for modern privacy law. It empowers consumers with unprecedented control over the use of their personal information. However, new guarantees of consumer privacy adversely affect data sharing and data application markets because service companies (e.g., Apple, Google, Microsoft) cannot provide immediate and optimized services through analysis of collected consumer experiences. Therefore, data de-identification technology (e.g., k-anonymity and differential privacy) is a candidate solution to protect sharing data privacy. Various workarounds based on existing methods such as k-anonymity and differential privacy technologies have been proposed. However, they are limited in data utility, and their data sets have high dimensionality (the so-called curse of dimensionality). In this paper, we propose the (\(k,\varepsilon ,\delta \))-anonymization synthetic data set generation mechanism (called (\(k,\varepsilon ,\delta \))-anonymization for short) to protect data privacy before releasing data sets to be analyzed. Synthetic data sets generated by (\(k,\varepsilon ,\delta \))-anonymization satisfy the definitions of k-anonymity and differential privacy by applying KD-tree and random sampling mechanisms. Moreover, (\(k,\varepsilon ,\delta \))-anonymization uses principle component analysis to rationally replace high-dimensional data sets with lower-dimensional data sets for consideration of efficient computation. Finally, we confirm the relationships between parameters k, \(\varepsilon \), and \(\delta \) for k-anonymity and (\(\varepsilon ,\delta \))-differential privacy and estimate the utility of (\(k,\varepsilon ,\delta \))-anonymization synthetic data sets. We report a privacy analysis and a series of experiments that prove that (\(k,\varepsilon ,\delta \))-anonymization is feasible and efficient.
Similar content being viewed by others
References
Bache K, Lichman M (2018) UCI machine learning repository. Accessed: Apr. [Online]. Available: https://archive.ics.uci.edu/ml/datasets.html/
European Union (2016) New Regulation of The European Union on The Protection of Personal Data (from 2018). [Online]. Available: https://data.europa.eu/eli/reg/2016/679/oj
FTC Report (2018) “Protecting Consumer Privacy in An Era of Rapid Change.” Accessed Apr 2018. [Online]. Available: https://www.ftc.gov/sites/default/files/documents/reports/
Blum A, Dwork C, McSherry F, Nissim K (2005) Practical privacy: The sulq framework. In: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp 128–138
Blum A, Ligett K, Roth A (2008) A learning theory approach to non-interactive database privacy. In: the ACM symposium on theory of computing, pp 609–618
Korolova A, Kenthapadi K, Mishra N, Ntoulas A (2009) Releasing search queries and clicks privately. In: Proceedings of International World Wide Web Conference, pp 171–180
Machanavajjhala A, Geheke J, Kifer D, Venkitasubramaniam M (2007) \(l\)-diversity: Privacy beyond \(k\)-anonymity. ACM Transa Knowl Discovery Data (TKDD) 1(3):1–47
Apple, (2017) Learning with privacy at scale. Mach Learn J 1(8):1–25
Machanavajjhala A, Kifer D, Abowd JM, Gehrke J, Vilhuber L (2008) Privacy: theory meets practice on the Map. In: Proceedings of IEEE international conference on data engineering, pp 277–286
Dwork C (2006) Differential privacy. In: Proceeding of the 33rd International Colloquium on Automata, Languages and Programming (ICALP), pp 1–12
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: 3rd Theory of Cryptography Conference, pp 265–284
Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 9(3):211–407
Xu C, Ren J, Zhang Y, Qin Z, Ren K (2017) DPPro: differentially private high-dimensional data release via random projection. IEEE Trans Inf Forensics Secur 12(12):3081–3093
Kifer D, Lin B-R (2010) Towards an axiomatization of statistical privacy and utility. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems of Data, pp 147–158
Kifer D, Machanavajjhala A (2011) No free lunch in data privacy. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 193–204
Josep D-F, Jordi S-C (2018) Connecting randomized response, post-randomization, differential privacy and \(t\)-closeness via deniability and permutation. arXiv:1803.02139v1 [cs.CR], pp 1–5
Zhao D, Chen H, Zhao S, Zhang X, Li C, Liu R (2019) Local differential privacy with \(k\)-anonymous for frequency estimation. In: Proceedings of IEEE international conference on Big Data (Big Data). https://doi.org/10.1109/BigData47090.2019.9006022
Health Records (2018) Accessed Dec 2018. [Online]. Available: https://github.com/m0607077/RoD
Wang J, Cai Z, Li Y, Yang D, Li J, Gao H (2018) Protecting query privacy with differentially private \(k\)-anonymity in location-based services. J Person Ubiquitous Comput 22:453–469
Nissim K, Raskhodnikova S, Smith A (2007) Smooth sensitivity and sampling in private data analysis. In: The ACM symposium on theory of computing, pp 75–84
Chaudhuri K, Mishra N (2006) When random sampling preserves privacy. In: CRYPTO, pp 198–213
Sweeney L (2002) \(k\)-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Syst 10(5):557–570
Gotz M, Machanavajjhala A, Wang G, Xiao X, Gehrke J (2011) Publishing Search Logs¡XA Comparative Study of Privacy Guarantees. IEEE Trans Knowl Data Eng 24(3):520–532
Holohan N, Antonatos S, Braghin S, Aonghusa PM (2017) (k,\(\varepsilon \))-Anonymity: \(k\)-Anonymity with \(\varepsilon \)-Differential Privacy. arXiv:1710.01615v1 [cs.CR], pp 1–12
Li N, Li T (2007) \(t\)-closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: Proceedings of the 23nd international conference on data engineering, pp 106–115
Li N, Qardaji W, Su D (2012) On sampling, anonymization, and differential privacy or, K-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, pp 32–42
Li N, Lyu M, Su D, Yang W (2016) Differential privacy: from theory to practice. Synthesis Lect Inform Secur Privacy Trust 8(4):1–138
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the 17th ACMSIGACT-SIGMOD-SIGART symposium on principles of database systems, p 188
Fletcher S, Islam MZ (2017) Differentially private random decision Forests using smooth sensitivity. In: Expert systems with applications, pp 16–31, at arXiv:1606.03572. https://doi.org/10.1016/j.eswa.2017.01.034
Acknowledgements
This work is supported by the Ministry of Science and Technology, Taiwan, under grant MOST 107-2221-E-035-020-MY3 and MOST 109-2221-E-001-019-MY3. This work is supported by Academia Sinica AS-KPQ-109-DSTCP. This research work is supported by the Research Council (TRC), Sultanate of Oman (Block Fund-Research Grant).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tsou, YT., Alraja, M.N., Chen, LS. et al. (\(k,\varepsilon ,\delta \))-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy. SOCA 15, 175–185 (2021). https://doi.org/10.1007/s11761-021-00324-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11761-021-00324-2