Evaluating Fuzzy Clustering Algorithms for Microdata Protection

Torra, Vicenç; Miyamoto, Sadaaki

doi:10.1007/978-3-540-25955-8_14

Vicenç Torra¹⁷ &
Sadaaki Miyamoto¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3050))

Included in the following conference series:

International Workshop on Privacy in Statistical Databases

873 Accesses
18 Citations

Abstract

Microaggregation is a well-known technique for data protection. It is usually operationally defined in a two-step process: (i) a large number of small clusters are built from data and (ii) data are replaced by cluster aggregates. In this work we study the use of fuzzy clustering in the first step. In particular, we consider standard fuzzy c-means and entropy based fuzzy c-means. For both methods, our study includes variable-size and non-variable-size variations. The resulting masking methods are compared using standard scoring methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bacher, J., Brand, R., Bender, S.: Re-identifying register data by survey data using cluster analysis: an empirical study. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10(5), 589–608 (2002)
Article MATH Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
MATH Google Scholar
Chiang, J.-H., Hao, P.-Y.: A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Trans. on Fuzzy Systems 11(4), 518–527 (2003)
Article Google Scholar
Data Extraction System, U.S. Census Bureau (2002), http://www.census.gov/DES/www/welcome.html
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 93–112. North-Holland, Amsterdam (2002)
Google Scholar
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 113–134. North- Holland, Amsterdam (2002)
Google Scholar
Domingo-Ferrer, J., Torra, V.: On the use of “At least k fuzzy c-means” in microaggregation: description and evaluation. In: Proc. of the Joint 1st Int. Conference on Soft Computing and Intelligent Systems and 3rd Int. Symposium on Advanced Intelligent Systems, CD-ROM, Tsukuba, Japan (2002)
Google Scholar
Domingo-Ferrer, J., Torra, V.: Fuzzy Microaggregation for Microdata Protection. J. of Advanced Computational Intelligence and Intelligent Informatics 7(2), 153–159 (2003)
Google Scholar
Eschrich, S., Ke, J., Hall, L.O., Goldgof, D.B.: Fast accurate fuzzy clustering through data reduction. IEEE Trans. on Fuzzy Systems 11(2), 262–270 (2003)
Article Google Scholar
Hundepool, A., Willenborg, L., Wessels, A., Van Gemerden, L., Tiourine, S., Hurkens, C.: μ-Argus 3.0 User’s Manual, Statistics Netherlands (1998)
Google Scholar
Ichihashi, H., Honda, K., Tani, N.: Gaussian mixture PDF approximation and fuzzy c-means clustering with entropy regularization. In: Proc. of the 4th Asian Fuzzy System Symposium, Tsukuba, Japan, May 31-June 3, pp. 217–221 (2000)
Google Scholar
Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Association 84, 414–420 (1989)
Article Google Scholar
Kolen, J.F., Hutcheson, T.: Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans. on Fuzzy Systems 10(2), 263–267 (2002)
Article Google Scholar
Leski, J.M.: Generalized weighted conditional fuzzy clustering. IEEE Trans. on Fuzzy Systems 11(6), 709–715 (2003)
Article Google Scholar
Miyamoto, S., Mukaidono, M.: Fuzzy c - means as a regularization and maximum entropy approach. In: Proc. of the 7th International Fuzzy Systems Association World Congress (IFSA 1997), Prague, Chech, June 25-30, vol. II, pp. 86–92 (1997)
Google Scholar
Miyamoto, S.: Introduction to fuzzy clustering. Morikita, Japan (1999)
Google Scholar
Miyamoto, S., Umayahara, K.: Fuzzy c-means with variables for cluster sizes. In: 16th Fuzzy System Symposium, Akita, September 6-8, pp. 537–538 (2000) (in Japanese)
Google Scholar
Miyamoto, S., Suizu, D.: Fuzzy c-means clustering using kernel functions in support vector machines. Journal of Advanced Computational Intelligence and Intelligent Informatics 7(1), 25–30 (2003)
Google Scholar
Miyamoto, S., Umayahara, K.: Methods in Hard and Fuzzy Clustering. In: Liu, Z.-Q., Miyamoto, S. (eds.) Soft Computing and Human-Centered Machines, pp. 85–129. Springer, Tokyo (2000)
Google Scholar
Shibuya, K., Miyamoto, S., Takata, O., Umayahara, K.: Regularization and Constraints in Fuzzy c-means and Possibilistic Clustering. Journal of the Japanese Fuzzy Society 13(6), 707–715 (2001)
Google Scholar
Torra, V., Domingo-Ferrer, J.: Record linkage methods for multidatabase data mining. In: Torra, V. (ed.) Information Fusion in Data Mining, pp. 99–130. Springer, Berlin (2003)
Google Scholar
Willenborg, L., De Waal, T.: Elements of Statistical Disclosure Control. LNS, vol. 155. Springer, New York (2001)
Book Google Scholar
Winkler, W.E.: Matching and record linkage. In: Cox, B.G. (ed.) Business Survey Methods, pp. 355–384. Wiley, New York (1995)
Google Scholar
Winkler, W.E.: Advanced methods for record linkage. In: Proceedings of the American Statistical Association Section on Survey Research Methods, pp. 467–472 (1995)
Google Scholar
Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure Risk Assessment in Perturbative Microdata Protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institut d’Investigació en Intel·ligència Artificial (IIIA-CSIC), E-08193, Bellaterra, Catalonia
Vicenç Torra
Institute of Engineering Mechanics and Systems, University of Tsukuba, Ibaraki, 305-8573, Japan
Sadaaki Miyamoto

Authors

Vicenç Torra
View author publications
You can also search for this author in PubMed Google Scholar
Sadaaki Miyamoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili, UNESCO Chair in Data Privacy, Av. Països Catalans 26, E-43007, Tarragona, Catalonia
Josep Domingo-Ferrer
IIIA, Artificial Intelligence Research Institute CSIC, Spanish National Research Council, Campus UAB s/n, 08193, Bellaterra, Catalonia, Spain
Vicenç Torra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Torra, V., Miyamoto, S. (2004). Evaluating Fuzzy Clustering Algorithms for Microdata Protection. In: Domingo-Ferrer, J., Torra, V. (eds) Privacy in Statistical Databases. PSD 2004. Lecture Notes in Computer Science, vol 3050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-25955-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-25955-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22118-0
Online ISBN: 978-3-540-25955-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics