( $$k,\varepsilon ,\delta $$ )-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy

Tsou, Yao-Tung; Alraja, Mansour Naser; Chen, Li-Sheng; Chang, Yu-Hsiang; Hu, Yung-Li; Huang, Yennun; Yu, Chia-Mu; Tsai, Pei-Yuan

doi:10.1007/s11761-021-00324-2

($k,\varepsilon ,\delta $)-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy

Special Issue Paper
Published: 06 August 2021

Volume 15, pages 175–185, (2021)
Cite this article

Service Oriented Computing and Applications Aims and scope Submit manuscript

Yao-Tung Tsou¹,
Mansour Naser Alraja²,
Li-Sheng Chen³,
Yu-Hsiang Chang³,
Yung-Li Hu⁴,
Yennun Huang⁴,
Chia-Mu Yu⁵ &
…
Pei-Yuan Tsai⁶

830 Accesses
3 Citations
Explore all metrics

Abstract

The General Data Protection Regulation came into effect on May 25, 2018, and has rapidly become a touchstone model for modern privacy law. It empowers consumers with unprecedented control over the use of their personal information. However, new guarantees of consumer privacy adversely affect data sharing and data application markets because service companies (e.g., Apple, Google, Microsoft) cannot provide immediate and optimized services through analysis of collected consumer experiences. Therefore, data de-identification technology (e.g., k-anonymity and differential privacy) is a candidate solution to protect sharing data privacy. Various workarounds based on existing methods such as k-anonymity and differential privacy technologies have been proposed. However, they are limited in data utility, and their data sets have high dimensionality (the so-called curse of dimensionality). In this paper, we propose the ($k,\varepsilon ,\delta $)-anonymization synthetic data set generation mechanism (called ($k,\varepsilon ,\delta $)-anonymization for short) to protect data privacy before releasing data sets to be analyzed. Synthetic data sets generated by ($k,\varepsilon ,\delta $)-anonymization satisfy the definitions of k-anonymity and differential privacy by applying KD-tree and random sampling mechanisms. Moreover, ($k,\varepsilon ,\delta $)-anonymization uses principle component analysis to rationally replace high-dimensional data sets with lower-dimensional data sets for consideration of efficient computation. Finally, we confirm the relationships between parameters k, $\varepsilon $, and $\delta $ for k-anonymity and ($\varepsilon ,\delta $)-differential privacy and estimate the utility of ($k,\varepsilon ,\delta $)-anonymization synthetic data sets. We report a privacy analysis and a series of experiments that prove that ($k,\varepsilon ,\delta $)-anonymization is feasible and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

COCOA: A Synthetic Data Generator for Testing Anonymization Techniques

Attribute susceptibility and entropy based data anonymization to improve users community privacy and utility in publishing data

Article Open access 12 March 2020

Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values

Article Open access 08 July 2020

References

Bache K, Lichman M (2018) UCI machine learning repository. Accessed: Apr. [Online]. Available: https://archive.ics.uci.edu/ml/datasets.html/
European Union (2016) New Regulation of The European Union on The Protection of Personal Data (from 2018). [Online]. Available: https://data.europa.eu/eli/reg/2016/679/oj
FTC Report (2018) “Protecting Consumer Privacy in An Era of Rapid Change.” Accessed Apr 2018. [Online]. Available: https://www.ftc.gov/sites/default/files/documents/reports/
Blum A, Dwork C, McSherry F, Nissim K (2005) Practical privacy: The sulq framework. In: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp 128–138
Blum A, Ligett K, Roth A (2008) A learning theory approach to non-interactive database privacy. In: the ACM symposium on theory of computing, pp 609–618
Korolova A, Kenthapadi K, Mishra N, Ntoulas A (2009) Releasing search queries and clicks privately. In: Proceedings of International World Wide Web Conference, pp 171–180
Machanavajjhala A, Geheke J, Kifer D, Venkitasubramaniam M (2007) $l$-diversity: Privacy beyond $k$-anonymity. ACM Transa Knowl Discovery Data (TKDD) 1(3):1–47
Google Scholar
Apple, (2017) Learning with privacy at scale. Mach Learn J 1(8):1–25
Machanavajjhala A, Kifer D, Abowd JM, Gehrke J, Vilhuber L (2008) Privacy: theory meets practice on the Map. In: Proceedings of IEEE international conference on data engineering, pp 277–286
Dwork C (2006) Differential privacy. In: Proceeding of the 33rd International Colloquium on Automata, Languages and Programming (ICALP), pp 1–12
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: 3rd Theory of Cryptography Conference, pp 265–284
Dwork C, Roth A (2014) The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 9(3):211–407
MathSciNet MATH Google Scholar
Xu C, Ren J, Zhang Y, Qin Z, Ren K (2017) DPPro: differentially private high-dimensional data release via random projection. IEEE Trans Inf Forensics Secur 12(12):3081–3093
Article Google Scholar
Kifer D, Lin B-R (2010) Towards an axiomatization of statistical privacy and utility. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems of Data, pp 147–158
Kifer D, Machanavajjhala A (2011) No free lunch in data privacy. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, pp 193–204
Josep D-F, Jordi S-C (2018) Connecting randomized response, post-randomization, differential privacy and $t$-closeness via deniability and permutation. arXiv:1803.02139v1 [cs.CR], pp 1–5
Zhao D, Chen H, Zhao S, Zhang X, Li C, Liu R (2019) Local differential privacy with $k$-anonymous for frequency estimation. In: Proceedings of IEEE international conference on Big Data (Big Data). https://doi.org/10.1109/BigData47090.2019.9006022
Health Records (2018) Accessed Dec 2018. [Online]. Available: https://github.com/m0607077/RoD
Wang J, Cai Z, Li Y, Yang D, Li J, Gao H (2018) Protecting query privacy with differentially private $k$-anonymity in location-based services. J Person Ubiquitous Comput 22:453–469
Article Google Scholar
Nissim K, Raskhodnikova S, Smith A (2007) Smooth sensitivity and sampling in private data analysis. In: The ACM symposium on theory of computing, pp 75–84
Chaudhuri K, Mishra N (2006) When random sampling preserves privacy. In: CRYPTO, pp 198–213
Sweeney L (2002) $k$-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Syst 10(5):557–570
Article MathSciNet Google Scholar
Gotz M, Machanavajjhala A, Wang G, Xiao X, Gehrke J (2011) Publishing Search Logs¡XA Comparative Study of Privacy Guarantees. IEEE Trans Knowl Data Eng 24(3):520–532
Article Google Scholar
Holohan N, Antonatos S, Braghin S, Aonghusa PM (2017) (k,$\varepsilon $)-Anonymity: $k$-Anonymity with $\varepsilon $-Differential Privacy. arXiv:1710.01615v1 [cs.CR], pp 1–12
Li N, Li T (2007) $t$-closeness: privacy beyond $k$-anonymity and $l$-diversity. In: Proceedings of the 23nd international conference on data engineering, pp 106–115
Li N, Qardaji W, Su D (2012) On sampling, anonymization, and differential privacy or, K-anonymization meets differential privacy. In: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, pp 32–42
Li N, Lyu M, Su D, Yang W (2016) Differential privacy: from theory to practice. Synthesis Lect Inform Secur Privacy Trust 8(4):1–138
Article Google Scholar
Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of the 17th ACMSIGACT-SIGMOD-SIGART symposium on principles of database systems, p 188
Fletcher S, Islam MZ (2017) Differentially private random decision Forests using smooth sensitivity. In: Expert systems with applications, pp 16–31, at arXiv:1606.03572. https://doi.org/10.1016/j.eswa.2017.01.034

Download references

Acknowledgements

This work is supported by the Ministry of Science and Technology, Taiwan, under grant MOST 107-2221-E-035-020-MY3 and MOST 109-2221-E-001-019-MY3. This work is supported by Academia Sinica AS-KPQ-109-DSTCP. This research work is supported by the Research Council (TRC), Sultanate of Oman (Block Fund-Research Grant).

Author information

Authors and Affiliations

Department of Communications Engineering, Feng Chia University, Taichung, 407, Taiwan
Yao-Tung Tsou
Department of Management Information Systems, College of Commerce and Business Administration, Dhofar University, Salalah, Oman
Mansour Naser Alraja
Department of Communications Engineering, Feng Chia University, Taichung, 407, Taiwan
Li-Sheng Chen & Yu-Hsiang Chang
Research Center for Information Technology Innovation, Academia Sinica, Taipei, 115, Taiwan
Yung-Li Hu & Yennun Huang
Department of Information Management and Finance, National National Yang Ming Chiao Tung University, Taipei, Taiwan
Chia-Mu Yu
Digital Service Innovation Institute, Institute for Information Industry, Taipei, 105, Taiwan
Pei-Yuan Tsai

Authors

Yao-Tung Tsou
View author publications
You can also search for this author in PubMed Google Scholar
Mansour Naser Alraja
View author publications
You can also search for this author in PubMed Google Scholar
Li-Sheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hsiang Chang
View author publications
You can also search for this author in PubMed Google Scholar
Yung-Li Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yennun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Mu Yu
View author publications
You can also search for this author in PubMed Google Scholar
Pei-Yuan Tsai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yao-Tung Tsou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsou, YT., Alraja, M.N., Chen, LS. et al. ($k,\varepsilon ,\delta $)-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy. SOCA 15, 175–185 (2021). https://doi.org/10.1007/s11761-021-00324-2

Download citation

Received: 08 February 2021
Revised: 29 May 2021
Accepted: 15 June 2021
Published: 06 August 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11761-021-00324-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

(\(k,\varepsilon ,\delta \))-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy

Abstract

Access this article

Similar content being viewed by others

COCOA: A Synthetic Data Generator for Testing Anonymization Techniques

Attribute susceptibility and entropy based data anonymization to improve users community privacy and utility in publishing data

Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

(\(k,\varepsilon ,\delta \))-Anonymization: privacy-preserving data release based on k-anonymity and differential privacy

Abstract

Access this article

Similar content being viewed by others

COCOA: A Synthetic Data Generator for Testing Anonymization Techniques

Attribute susceptibility and entropy based data anonymization to improve users community privacy and utility in publishing data

Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation