Abstract
We consider the clustering with diversity problem: given a set of colored points in a metric space, partition them into clusters such that each cluster has at least ā points, all of which have distinct colors. We give a 2-approximation to this problem for any ā when the objective is to minimize the maximum radius of any cluster. We show that the approximation ratio is optimal unless Pā=ā NP, by providing a matching lower bound. Several extensions to our algorithm have also been developed for handling outliers. This problem is mainly motivated by applications in privacy-preserving data publication.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: PODS, pp. 153ā162 (2006)
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol.Ā 3363, pp. 246ā258. Springer, Heidelberg (2004)
Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: Ranking and clustering. J. ACMĀ 55(5), 1ā27 (2008)
Arasu, A., RĆ©, C., Suciu, D.: Large-scale deduplication with constraints using Dedupalog. In: ICDE, pp. 952ā963 (2009)
Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic acids researchĀ 25(1), 31 (1997)
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine LearningĀ 56(1), 89ā113 (2004)
Beresford, A., Stajano, F.: Location privacy in pervasive computing. IEEE Pervasive Computing, 46ā55 (2003)
Wong, R.C.-W., Li, J., Fu, A.-C., Wang, K.: (Ī±, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: SIGKDD, pp. 754ā759 (2006)
Charikar, M., Khuller, S., Mount, D., Narasimhan, G.: Algorithms for facility location problems with outliers. In: SODA, pp. 642ā651 (2001)
Davidson, I., Ravi, S.: Intractability and clustering with constraints. In: ICML, pp. 201ā208 (2007)
Dwork, C., Naor, M., Reingold, O., Rothblum, G., Vadhan, S.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: STOC, pp. 381ā390 (2009)
Feldman, D., Fiat, A., Kaplan, H., Nissim, K.: Private coresets. In: STOC, pp. 361ā370 (2009)
Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: VLDB, pp. 758ā769 (2007)
Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA, pp. 1176ā1185 (2006)
Hoppner, F., Klawonn, F., Platz, R., Str, S.: Clustering with Size Constraints. Computational Intelligence Paradigms: Innovative Applications (2008)
Ji, X.: Graph Partition Problems with Minimum Size Constraints. PhD thesis, Rensselaer Polytechnic Institute (2004)
Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: SIGMOD, pp. 217ā228 (2006)
Korte, B., Vygen, J.: Combinatorial Optimization: Theory and Algorithms, 4th edn. Springer, Heidelberg (2007)
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE, p. 25 (2006)
Li, J., Yi, K., Zhang, Q.: Clustering with diversity (2010), http://arxiv.org/abs/1004.2968
Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS, pp. 223ā228 (2004)
Alsuwaiyel, M.H.: Algorithms: Design Techniques and Analysis. World Scientific, Singapore (1998)
Park, H., Shim, K.: Approximate algorithms for k-anonymity. In: SIGMOD (2007)
Samarati, P.: Protecting respondentsā identities in microdata release. TKDEĀ 13(6), 1010ā1027 (2001)
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: ICML, pp. 1103ā1110 (2000)
Wagstaff, K., Cardie, C., Schroedl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577ā584 (2001)
Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139ā150 (2006)
Xiao, X., Tao, Y.: m-invariance: Towards privacy preserving re-publication of dynamic datasets. In: SIGMOD, pp. 689ā700 (2007)
Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In: EDBT (2010)
Xing, E., Ng, A., Jordan, M., Russell, S.: Distance metric learning, with application to clustering with side-information. In: NIPS, pp. 505ā512 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, J., Yi, K., Zhang, Q. (2010). Clustering with Diversity. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds) Automata, Languages and Programming. ICALP 2010. Lecture Notes in Computer Science, vol 6198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14165-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-14165-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14164-5
Online ISBN: 978-3-642-14165-2
eBook Packages: Computer ScienceComputer Science (R0)