Skip to main content

Quantifying Privacy: A Novel Entropy-Based Measure of Disclosure Risk

  • Conference paper
  • First Online:
Combinatorial Algorithms (IWOCA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8986))

Included in the following conference series:

  • International Workshop on Combinatorial Algorithms

Abstract

It is well recognised that data mining and statistical analysis pose a serious treat to privacy. This is true for financial, medical, criminal and marketing research. Numerous techniques have been proposed to protect privacy, including restriction and data modification. Recently proposed privacy models such as differential privacy and k-anonymity received a lot of attention and for the latter there are now several improvements of the original scheme, each removing some security shortcomings of the previous one. However, the challenge lies in evaluating and comparing privacy provided by various techniques. In this paper we propose a novel entropy based security measure that can be applied to any generalisation, restriction or data modification technique. We use our measure to empirically evaluate and compare a few popular methods, namely query restriction, sampling and noise addition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adam, N.R., Worthmann, J.C.: Security-control methods for statistical databases: a comparative study. ACM Comput. Surv. 21(4), 515–556 (1989)

    Article  Google Scholar 

  2. Ahlswede, R., Aydinian, H.: On security of statistical databases. SIAM J. Discrete Math. 25(4), 1778–1791 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  3. Al-Saggaf, Y., Islam, M.Z.: Privacy in social network sites (SNS) - the threats from data mining. Ethical Space: J. Commun. Ethics 9(4), 32–40 (2012)

    Google Scholar 

  4. Al-Saggaf, F., Islam, M.Z.: A malicious use of a clustering algorithm to threaten the privacy of a social networking site user. World J. Comput. Appl. Technol. 1(2), 29–34 (2013)

    Google Scholar 

  5. Al-Saggaf, Y., Islam, M.Z.: Data mining and privacy of social network sites users: implications of the data mining problem. Sci. Eng. Ethics (2014)

    Google Scholar 

  6. Blake, C.L.: Wine Recognition Data (1998)

    Google Scholar 

  7. Brankovic, L.: Usability of secure statistical databases. Ph.D. Thesis, Newcastle, Australia (1998)

    Google Scholar 

  8. Brankovic, L., Cvetkovic, D.: The eigenspace of the eigenvalue -2 in generalized line graphs and a problem in security of statistical databases. Publikacije ETF, Serija: matematika. 14, 37–48 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  9. Brankovic, L., Estivill-Castro, V.: Privacy issues in knowledge discovery and data mining. In: Australian Institute of Computer Ethics Conference, pp. 89–99 (1999)

    Google Scholar 

  10. Brankovic, L., Giggins, H.: Statistical database security. In: Petković, M., Jonker, W. (eds.) Security, Privacy, and Trust in Modern Data Management, pp. 167–181. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Brankovic, L., Horak, P., Miller, M.: An optimization problem in statistical databases. SIAM J. Discrete Math. 13(3), 46–353 (2000)

    Article  MathSciNet  Google Scholar 

  12. Brankovic, L., Horak, P., Miller, M., Wrightson, G.: Usability of compromise-free statistical databases for range sum queries. In: 9th International Conference on Scientific and Statistical Database Management, pp. 144–154. IEEE Computer Society (1997)

    Google Scholar 

  13. Brankovic, L., Islam, M.Z., Giggins, H.: Security, privacy, and trust in modern data management. In: Petković, M., Jonker, W. (eds.) Privacy-Preserving Data Mining, pp. 151–165. Springer, Heidelberg (2007)

    Google Scholar 

  14. Brankovic, L., Lopez, N., Miller, M., Sebe, F.: Triangle randomization for social network data anonymization. Ars Math. Contemp. 7(2), 461–477 (2014)

    MATH  MathSciNet  Google Scholar 

  15. Brankovic, L., Miller, M., Siran, J.: Graphs, 0–1 matrices, and usability of statistical databases. Congressus Numerantium 12, 169–182 (1996)

    MathSciNet  Google Scholar 

  16. Brankovic, L., Miller, M., Siran, J.: Usability of k-compromise-free statistical databases. In: Proceedings of the 11th Australasian Workshop on Combinatorial Algorithms (AWOCA 2000), Hunter Valley, pp. 159–166 (2000)

    Google Scholar 

  17. Brankovic, L., Miller, M., Siran, J.: Range query usability of statistical databases. Int. J. Comput. Math. 79(12), 1265–1271 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  18. Brankovic, L., Sirán, J.: 2-compromise usability in 1-dimensional statistical databases. In: Ibarra, O.H., Zhang, L. (eds.) COCOON 2002. LNCS, vol. 2387, pp. 448–455. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  19. Denning, D.E.: Cryptography and Data Security. Addison-Wesley Longman Publishing Co., Inc., Boston (1982)

    MATH  Google Scholar 

  20. Duncan, G.T., Lambert, D.: Disclosure-limited data dissemination. J. Am. Stat. Assoc. 81, 10–28 (1986)

    Article  Google Scholar 

  21. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  22. Estivill-Castro, V., Brankovic, L.: Data swapping: balancing privacy against precision in mining for logic rules. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 389–398. Springer, Heidelberg (1999)

    Google Scholar 

  23. Estivill-Castro, V., Brankovic, L., Dowe, D.L.: Privacy in data mining. Privacy - Law Policy Reporter 9(3), 33–35 (1999)

    Google Scholar 

  24. Fletcher, S., Islam, M.Z.: Measuring information quality for privacy preserving data mining. Int. J. Comput. Theory Eng. 7(1), 21–28 (2015)

    Article  Google Scholar 

  25. Fuller, W.A.: Masking procedures for microdata disclosure limitation. J. Off. Stat. 9(2), 383–406 (1993)

    Google Scholar 

  26. Fung, C.C.M., Wang, K., Chen, R., Yu, P.S.: Privacy-preserving data publishing: a survey of recent developments. ACM Comput. Surv. 42(4), 14.2–14.53 (2010)

    Article  Google Scholar 

  27. Giggins, H.: Security of genetic databases. Ph.D. Thesis, Newcastle, Australia (2009)

    Google Scholar 

  28. Giggins, H., Brankovic, L.: VICUS - a noise addition technique for categorical data. In: 10th Australasian Data Mining Conference. CRPIT, vol. 134, pp. 139–148 (2012)

    Google Scholar 

  29. Griggs, J.R.: Concentrating subset sums at k points. Bull. Inst. Comb. Appl. 20, 65–74 (1997)

    MATH  MathSciNet  Google Scholar 

  30. Griggs, J.R.: Database security and the distribution of subset sums in \(R^m\). In: Proceedings of the International Colloquium on Combinatorics and Graph Theory (1998)

    Google Scholar 

  31. Horak, P., Brankovic, L., Miller, M.: A combinatorial problem in database security. Discrete Appl. Math. 91(1–3), 119–126 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  32. Islam, M.Z.: Privacy preservation in data mining through noise addition. Ph.D. Thesis, Newcastle, Australia (2008)

    Google Scholar 

  33. Islam, M.Z., Barnaghi, P.M., Brankovic, L.: Measuring data quality: predictive accuracy vs. similarity of decision trees. In: 6th International Conference on Computer and Information Technology, Dhaka, Bangladesh, pp. 457–462 (2003)

    Google Scholar 

  34. Islam, M.Z., Brankovic, L.: Noise addition for protecting privacy in data mining. In: 6th Engineering Mathematics and Applications Conference, Sydney, pp. 85–90 (2003)

    Google Scholar 

  35. Islam, M.Z., Brankovic, L.: Detective: a decision tree based categorical value clustering and perturbation technique in privacy preserving data mining. In: 3rd International IEEE Conference on Industrial Informatics, Australia, pp. 701–708 (2005)

    Google Scholar 

  36. Kim, J.J.: A method for limiting disclosure in microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods, pp. 303–308. American Statistical Association (1986)

    Google Scholar 

  37. Kim, J.J., Winkler, W.E.: Masking microdata files. In: Proceedings of the Section on Survey Research Methods, pp. 114–119. American Statistical Association (1995)

    Google Scholar 

  38. King, T., Brankovic, L., Gillard, P.: Perspectives of Australian adults about protecting the privacy of their health information in statistical databases. Int. J. Med. Inform. 81(4), 279–289 (2012)

    Article  Google Scholar 

  39. Lambert, D.: Measures of disclosure risk and harm. J. Off. Stat. 9, 313–331 (1993)

    Google Scholar 

  40. Li, N., Li, T., Venkatasubramanian, S.: \(t\)-closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: IEEE International Conference on Data Engineering (2007)

    Google Scholar 

  41. Lopez, N., Sebe, F.: Privacy preserving release of blogosphere data in the presence of search engines. Inf. Process. Manage. 49(4), 833–851 (2013)

    Article  Google Scholar 

  42. López, N., Sebé, F.: Degree sequences of pagerank uniform graphs and digraphs with prime outdegrees. In: Lecroq, T., Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 303–313. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  43. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: \(l\)-diversity: privacy beyond \(k\)-anonymity. ACM Trans. Knowl. Discov. Data. 1 (2007)

    Google Scholar 

  44. Morris, S., Cooper, J., Bomba, D., Brankovic, L., Miller, M., Pacheco, F.: Australian healthcare: a smart card for a clever country. Int. J. Biomed. Comput. 40(2), 101–105 (1995)

    Article  Google Scholar 

  45. Oganian, A., Domingo-Ferrer, J.: A posteriori disclosure risk measure for tabular data based on conditional entropy. SORT - Stat. Oper. Res. Trans. 27(2), 175–190 (2003)

    MATH  MathSciNet  Google Scholar 

  46. Public Use Microdata Sample (PUMS) (2006)

    Google Scholar 

  47. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04. SRI Computer Science Laboratory, Palo Alto, CA (1998)

    Google Scholar 

  48. Sankar, L., Rajagopalan, S.R., Poor, H.V.: Utility-privacy tradeoffs for databases: an information-theoretic approach. IEEE Trans. Inf. Forensics Secur. 9(6), 838–852 (2013). Special Issue on Privacy and Trust Management in the Cloud and Distributed Data Systems

    Article  Google Scholar 

  49. Skinner, C.J., Elliot, M.J.: A measure of disclosure risk for microdata. J. Roy. Stat. Soc. B 64(4), 855–867 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  50. Spruill, N.L.: Measures of Confidentiality, Statistics of Income and Related Administrative Record Research, pp. 131–136 (1982)

    Google Scholar 

  51. Sramka, M., Safavi-Naini, R., Denzinger, J., Askari, M.: A Practice-oriented framework for measuring privacy and utility in data sanitization systems. In: EDBT/ICDT2010 Workshops, Lausanne, Switzerland, pp. 315–333 (2010)

    Google Scholar 

  52. Tendick, P.: Optimal noise addition for preserving confidentiality in multivariate data. J. Stat. Plan. Inference 27, 341–353 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  53. Trottini, M., Fienberg, S.E.: Modelling user uncertainty for disclosure risk and data utility. Int. J. Uncertain. Fuzz. Knowl. Based Sys. 10(5), 511–527 (2002)

    Article  MATH  Google Scholar 

  54. Truta, T.M., Fotouhi, F., Barth-Jones, D.: Disclosure risk measures for the sampling disclosure control method. In: 2004 ACM symposium on Applied computing (SAC 2004), NY, USA, pp. 301–306 (2004)

    Google Scholar 

  55. Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics, p. 155. Springer-Verlag, New York (2001)

    Book  MATH  Google Scholar 

  56. Winkler, W.E.: Masking and re-identification methods for public-use microdata: overview and research problems. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 231–246. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  57. Wolberg, W.H., Street, W.N., Mangasarian, O.L.: Wisc. Diag. Breast Can. (1995)

    Google Scholar 

  58. Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, p. 135. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ljiljana Brankovic .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Alfalayleh, M., Brankovic, L. (2015). Quantifying Privacy: A Novel Entropy-Based Measure of Disclosure Risk. In: Jan, K., Miller, M., Froncek, D. (eds) Combinatorial Algorithms. IWOCA 2014. Lecture Notes in Computer Science(), vol 8986. Springer, Cham. https://doi.org/10.1007/978-3-319-19315-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19315-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19314-4

  • Online ISBN: 978-3-319-19315-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics