Skip to main content

Clustering with Diversity

  • Conference paper
Automata, Languages and Programming (ICALP 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6198))

Included in the following conference series:

Abstract

We consider the clustering with diversity problem: given a set of colored points in a metric space, partition them into clusters such that each cluster has at least ā„“ points, all of which have distinct colors. We give a 2-approximation to this problem for any ā„“ when the objective is to minimize the maximum radius of any cluster. We show that the approximation ratio is optimal unless Pā€‰=ā€‰ NP, by providing a matching lower bound. Several extensions to our algorithm have also been developed for handling outliers. This problem is mainly motivated by applications in privacy-preserving data publication.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: PODS, pp. 153ā€“162 (2006)

    Google ScholarĀ 

  2. Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol.Ā 3363, pp. 246ā€“258. Springer, Heidelberg (2004)

    ChapterĀ  Google ScholarĀ 

  3. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: Ranking and clustering. J. ACMĀ 55(5), 1ā€“27 (2008)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  4. Arasu, A., RĆ©, C., Suciu, D.: Large-scale deduplication with constraints using Dedupalog. In: ICDE, pp. 952ā€“963 (2009)

    Google ScholarĀ 

  5. Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic acids researchĀ 25(1), 31 (1997)

    ArticleĀ  Google ScholarĀ 

  6. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine LearningĀ 56(1), 89ā€“113 (2004)

    ArticleĀ  MATHĀ  Google ScholarĀ 

  7. Beresford, A., Stajano, F.: Location privacy in pervasive computing. IEEE Pervasive Computing, 46ā€“55 (2003)

    Google ScholarĀ 

  8. Wong, R.C.-W., Li, J., Fu, A.-C., Wang, K.: (Ī±, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: SIGKDD, pp. 754ā€“759 (2006)

    Google ScholarĀ 

  9. Charikar, M., Khuller, S., Mount, D., Narasimhan, G.: Algorithms for facility location problems with outliers. In: SODA, pp. 642ā€“651 (2001)

    Google ScholarĀ 

  10. Davidson, I., Ravi, S.: Intractability and clustering with constraints. In: ICML, pp. 201ā€“208 (2007)

    Google ScholarĀ 

  11. Dwork, C., Naor, M., Reingold, O., Rothblum, G., Vadhan, S.: On the complexity of differentially private data release: efficient algorithms and hardness results. In: STOC, pp. 381ā€“390 (2009)

    Google ScholarĀ 

  12. Feldman, D., Fiat, A., Kaplan, H., Nissim, K.: Private coresets. In: STOC, pp. 361ā€“370 (2009)

    Google ScholarĀ 

  13. Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: VLDB, pp. 758ā€“769 (2007)

    Google ScholarĀ 

  14. Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA, pp. 1176ā€“1185 (2006)

    Google ScholarĀ 

  15. Hoppner, F., Klawonn, F., Platz, R., Str, S.: Clustering with Size Constraints. Computational Intelligence Paradigms: Innovative Applications (2008)

    Google ScholarĀ 

  16. Ji, X.: Graph Partition Problems with Minimum Size Constraints. PhD thesis, Rensselaer Polytechnic Institute (2004)

    Google ScholarĀ 

  17. Kifer, D., Gehrke, J.: Injecting utility into anonymized datasets. In: SIGMOD, pp. 217ā€“228 (2006)

    Google ScholarĀ 

  18. Korte, B., Vygen, J.: Combinatorial Optimization: Theory and Algorithms, 4th edn. Springer, Heidelberg (2007)

    Google ScholarĀ 

  19. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE, p. 25 (2006)

    Google ScholarĀ 

  20. Li, J., Yi, K., Zhang, Q.: Clustering with diversity (2010), http://arxiv.org/abs/1004.2968

  21. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: ICDE, p. 24 (2006)

    Google ScholarĀ 

  22. Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS, pp. 223ā€“228 (2004)

    Google ScholarĀ 

  23. Alsuwaiyel, M.H.: Algorithms: Design Techniques and Analysis. World Scientific, Singapore (1998)

    Google ScholarĀ 

  24. Park, H., Shim, K.: Approximate algorithms for k-anonymity. In: SIGMOD (2007)

    Google ScholarĀ 

  25. Samarati, P.: Protecting respondentsā€™ identities in microdata release. TKDEĀ 13(6), 1010ā€“1027 (2001)

    Google ScholarĀ 

  26. Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: ICML, pp. 1103ā€“1110 (2000)

    Google ScholarĀ 

  27. Wagstaff, K., Cardie, C., Schroedl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577ā€“584 (2001)

    Google ScholarĀ 

  28. Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: VLDB, pp. 139ā€“150 (2006)

    Google ScholarĀ 

  29. Xiao, X., Tao, Y.: m-invariance: Towards privacy preserving re-publication of dynamic datasets. In: SIGMOD, pp. 689ā€“700 (2007)

    Google ScholarĀ 

  30. Xiao, X., Yi, K., Tao, Y.: The hardness and approximation algorithms for l-diversity. In: EDBT (2010)

    Google ScholarĀ 

  31. Xing, E., Ng, A., Jordan, M., Russell, S.: Distance metric learning, with application to clustering with side-information. In: NIPS, pp. 505ā€“512 (2003)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, J., Yi, K., Zhang, Q. (2010). Clustering with Diversity. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds) Automata, Languages and Programming. ICALP 2010. Lecture Notes in Computer Science, vol 6198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14165-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14165-2_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14164-5

  • Online ISBN: 978-3-642-14165-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics