Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences

Manthey, Bodo; Röglin, Heiko

doi:10.1007/978-3-642-10631-6_103

Bodo Manthey¹⁹ &
Heiko Röglin²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5878))

Included in the following conference series:

International Symposium on Algorithms and Computation

1758 Accesses
6 Citations

Abstract

The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice. Most of the theoretical work is restricted to the case that squared Euclidean distances are used as similarity measure. In many applications, however, data is to be clustered with respect to other measures like, e.g., relative entropy, which is commonly used to cluster web pages. In this paper, we analyze the running-time of the k-means method for Bregman divergences, a very general class of similarity measures including squared Euclidean distances and relative entropy. We show that the exponential lower bound known for the Euclidean case carries over to almost every Bregman divergence. To narrow the gap between theory and practice, we also study k-means in the semi-random input model of smoothed analysis. For the case that n data points in ℝ^d are perturbed by noise with standard deviation σ, we show that for almost arbitrary Bregman divergences the expected running-time is bounded by \({\rm poly}(n^{\sqrt k}, 1/\sigma)\) and k ^kd ·poly(n, 1/σ).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ackermann, M.R., Blömer, J.: Coresets and approximate clustering for Bregman divergences. In: Proc. of the 20th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 1088–1097 (2009)
Google Scholar
Ackermann, M.R., Blömer, J., Sohler, C.: Clustering for metric and non-metric distance measures. In: Proc. of the 19th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 799–808 (2008)
Google Scholar
Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proc. of the 50th Ann. IEEE Symp. on Found. of Computer Science, FOCS (to appear, 2009)
Google Scholar
Arthur, D., Vassilvitskii, S.: Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. SIAM Journal on Computing 39(2), 766–782 (2009)
Article MathSciNet Google Scholar
Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. Journal of Machine Learning Research 6, 1705–1749 (2005)
MathSciNet Google Scholar
Berkhin, P.: Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, USA (2002)
Google Scholar
Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research 3, 1265–1287 (2003)
Article MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Chichester (2000)
Google Scholar
Feller, W.: An Introduction to Probability Theory and Its Applications, vol. II. John Wiley & Sons, Chichester (1971)
MATH Google Scholar
Gray, R.M., Buzo, A., Gray Jr., A.H., Matsuyama, Y.: Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(4), 367–376 (1980)
Article MATH Google Scholar
Inaba, M., Katoh, N., Imai, H.: Variance-based k-clustering algorithms by Voronoi diagrams and randomization. IEICE Transactions on Information and Systems E83-D(6), 1199–1206 (2000)
Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–137 (1982)
Article MATH MathSciNet Google Scholar
Manthey, B., Röglin, H.: Improved smoothed analysis of the k-means method. In: Proc. of the 20th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 461–470 (2009)
Google Scholar
Spielman, D.A., Teng, S.-H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM 51(3), 385–463 (2004)
Article MathSciNet Google Scholar
Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Proc. of the 25th ACM Symp. on Computational Geometry (SoCG), pp. 324–332 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics, University of Twente,
Bodo Manthey
Department of Quantitative Economics, Maastricht University,
Heiko Röglin

Authors

Bodo Manthey
View author publications
You can also search for this author in PubMed Google Scholar
Heiko Röglin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Electrical and Computer Engineering, University of Hawaii, Holmes Hall 483, 2540 Dole Street, 96822, Honolulu, HI, USA
Yingfei Dong
Department of Computer Science, University of Texas at Dallas, 75083, Dallas, TX, USA
Ding-Zhu Du
Department of Computer Science, University of California, 93106, Santa Barbara, CA, USA
Oscar Ibarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manthey, B., Röglin, H. (2009). Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences. In: Dong, Y., Du, DZ., Ibarra, O. (eds) Algorithms and Computation. ISAAC 2009. Lecture Notes in Computer Science, vol 5878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10631-6_103

Download citation

DOI: https://doi.org/10.1007/978-3-642-10631-6_103
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10630-9
Online ISBN: 978-3-642-10631-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences