Skip to main content

Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences

  • Conference paper
Algorithms and Computation (ISAAC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5878))

Included in the following conference series:

Abstract

The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice. Most of the theoretical work is restricted to the case that squared Euclidean distances are used as similarity measure. In many applications, however, data is to be clustered with respect to other measures like, e.g., relative entropy, which is commonly used to cluster web pages. In this paper, we analyze the running-time of the k-means method for Bregman divergences, a very general class of similarity measures including squared Euclidean distances and relative entropy. We show that the exponential lower bound known for the Euclidean case carries over to almost every Bregman divergence. To narrow the gap between theory and practice, we also study k-means in the semi-random input model of smoothed analysis. For the case that n data points in ℝd are perturbed by noise with standard deviation σ, we show that for almost arbitrary Bregman divergences the expected running-time is bounded by \({\rm poly}(n^{\sqrt k}, 1/\sigma)\) and k kd ·poly(n, 1/σ).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ackermann, M.R., Blömer, J.: Coresets and approximate clustering for Bregman divergences. In: Proc. of the 20th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 1088–1097 (2009)

    Google Scholar 

  2. Ackermann, M.R., Blömer, J., Sohler, C.: Clustering for metric and non-metric distance measures. In: Proc. of the 19th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 799–808 (2008)

    Google Scholar 

  3. Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proc. of the 50th Ann. IEEE Symp. on Found. of Computer Science, FOCS (to appear, 2009)

    Google Scholar 

  4. Arthur, D., Vassilvitskii, S.: Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. SIAM Journal on Computing 39(2), 766–782 (2009)

    Article  MathSciNet  Google Scholar 

  5. Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. Journal of Machine Learning Research 6, 1705–1749 (2005)

    MathSciNet  Google Scholar 

  6. Berkhin, P.: Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, USA (2002)

    Google Scholar 

  7. Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research 3, 1265–1287 (2003)

    Article  MATH  Google Scholar 

  8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Chichester (2000)

    Google Scholar 

  9. Feller, W.: An Introduction to Probability Theory and Its Applications, vol. II. John Wiley & Sons, Chichester (1971)

    MATH  Google Scholar 

  10. Gray, R.M., Buzo, A., Gray Jr., A.H., Matsuyama, Y.: Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(4), 367–376 (1980)

    Article  MATH  Google Scholar 

  11. Inaba, M., Katoh, N., Imai, H.: Variance-based k-clustering algorithms by Voronoi diagrams and randomization. IEICE Transactions on Information and Systems E83-D(6), 1199–1206 (2000)

    Google Scholar 

  12. Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–137 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  13. Manthey, B., Röglin, H.: Improved smoothed analysis of the k-means method. In: Proc. of the 20th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 461–470 (2009)

    Google Scholar 

  14. Spielman, D.A., Teng, S.-H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM 51(3), 385–463 (2004)

    Article  MathSciNet  Google Scholar 

  15. Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Proc. of the 25th ACM Symp. on Computational Geometry (SoCG), pp. 324–332 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Manthey, B., Röglin, H. (2009). Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences. In: Dong, Y., Du, DZ., Ibarra, O. (eds) Algorithms and Computation. ISAAC 2009. Lecture Notes in Computer Science, vol 5878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10631-6_103

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10631-6_103

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10630-9

  • Online ISBN: 978-3-642-10631-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics