Clustering with Diversity

Li, Jian; Yi, Ke; Zhang, Qin

doi:10.1007/978-3-642-14165-2_17

Jian Li²¹,
Ke Yi²² &
Qin Zhang²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6198))

Included in the following conference series:

International Colloquium on Automata, Languages, and Programming

1532 Accesses
14 Citations

Abstract

We consider the clustering with diversity problem: given a set of colored points in a metric space, partition them into clusters such that each cluster has at least ℓ points, all of which have distinct colors. We give a 2-approximation to this problem for any ℓ when the objective is to minimize the maximum radius of any cluster. We show that the approximation ratio is optimal unless P = NP, by providing a matching lower bound. Several extensions to our algorithm have also been developed for handling outliers. This problem is mainly motivated by applications in privacy-preserving data publication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: PODS, pp. 153–162 (2006)
Google Scholar
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 246–258. Springer, Heidelberg (2004)
Chapter Google Scholar
Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: Ranking and clustering. J. ACM 55(5), 1–27 (2008)
Article MathSciNet Google Scholar
Arasu, A., Ré, C., Suciu, D.: Large-scale deduplication with constraints using Dedupalog. In: ICDE, pp. 952–963 (2009)
Google Scholar
Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic acids research 25(1), 31 (1997)
Article Google Scholar
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning 56(1), 89–113 (2004)
Article MATH Google Scholar
Beresford, A., Stajano, F.: Location privacy in pervasive computing. IEEE Pervasive Computing, 46–55 (2003)
Google Scholar
Wong, R.C.-W., Li, J., Fu, A.-C., Wang, K.: (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: SIGKDD, pp. 754–759 (2006)
Google Scholar
Charikar, M., Khuller, S., Mount, D., Narasimhan, G.: Algorithms for facility location problems with outliers. In: SODA, pp. 642–651 (2001)
Google Scholar
Davidson, I., Ravi, S.: Intractability and clustering with constraints. In: ICML, pp. 201–208 (2007)
Google Scholar
Dwork, C., Naor, M., Reingold, O., Rothblum, G., Vadhan, S.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: STOC, pp. 381–390 (2009)
Google Scholar
Feldman, D., Fiat, A., Kaplan, H., Nissim, K.: Private coresets. In: STOC, pp. 361–370 (2009)
Google Scholar
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: VLDB, pp. 758–769 (2007)
Google Scholar
Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA, pp. 1176–1185 (2006)
Google Scholar
Hoppner, F., Klawonn, F., Platz, R., Str, S.: Clustering with Size Constraints. Computational Intelligence Paradigms: Innovative Applications (2008)
Google Scholar
Ji, X.: Graph Partition Problems with Minimum Size Constraints. PhD thesis, Rensselaer Polytechnic Institute (2004)
Google Scholar
Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: SIGMOD, pp. 217–228 (2006)
Google Scholar
Korte, B., Vygen, J.: Combinatorial Optimization: Theory and Algorithms, 4th edn. Springer, Heidelberg (2007)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE, p. 25 (2006)
Google Scholar
Li, J., Yi, K., Zhang, Q.: Clustering with diversity (2010), http://arxiv.org/abs/1004.2968
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)
Google Scholar
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS, pp. 223–228 (2004)
Google Scholar
Alsuwaiyel, M.H.: Algorithms: Design Techniques and Analysis. World Scientific, Singapore (1998)
Google Scholar
Park, H., Shim, K.: Approximate algorithms for k-anonymity. In: SIGMOD (2007)
Google Scholar
Samarati, P.: Protecting respondents’ identities in microdata release. TKDE 13(6), 1010–1027 (2001)
Google Scholar
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: ICML, pp. 1103–1110 (2000)
Google Scholar
Wagstaff, K., Cardie, C., Schroedl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)
Google Scholar
Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139–150 (2006)
Google Scholar
Xiao, X., Tao, Y.: m-invariance: Towards privacy preserving re-publication of dynamic datasets. In: SIGMOD, pp. 689–700 (2007)
Google Scholar
Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In: EDBT (2010)
Google Scholar
Xing, E., Ng, A., Jordan, M., Russell, S.: Distance metric learning, with application to clustering with side-information. In: NIPS, pp. 505–512 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Maryland, College Park, MD, USA
Jian Li
Hong Kong University of Science and Technology, Hong Kong, China
Ke Yi & Qin Zhang

Authors

Jian Li
View author publications
You can also search for this author in PubMed Google Scholar
Ke Yi
View author publications
You can also search for this author in PubMed Google Scholar
Qin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Oxford University Computing Laboratory, Wolfson Building, Parks Road, OX1 3QD, Oxford, UK
Samson Abramsky
Université de Bordeaux (LaBRI) & INRIA, 351, cours de la Libération, 33405, Talence Cedex, France
Cyril Gavoille
INRIA, Centre de Recherche Bordeaux – Sud-Ouest, 351 cours de la Libération, 33405, Talence Cedex, France
Claude Kirchner
Heinz Nixdorf Institute, University of Paderborn, Fürstenallee 11, 33102, Paderborn, Germany
Friedhelm Meyer auf der Heide
University of Patras and RACTI, 26500, Patras, Greece
Paul G. Spirakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Yi, K., Zhang, Q. (2010). Clustering with Diversity. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds) Automata, Languages and Programming. ICALP 2010. Lecture Notes in Computer Science, vol 6198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14165-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-14165-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14164-5
Online ISBN: 978-3-642-14165-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics