k-Means Clustering with Outlier Detection, Mixed Variables and Missing Values

Wishart, D.

doi:10.1007/978-3-642-55721-7_23

D. Wishart⁶

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

1122 Accesses
18 Citations

Abstract

This paper addresses practical issues in k-means cluster analysis or segmentation with mixed types of variables and missing values. A more general k-means clustering procedure is developed that is suitable for use with very large datasets, such as arise in data mining and survey analysis. An exact assignment test guarantees that the algorithm will converge, and the detection of outliers allows the densest regions of the sample space to be mapped by tessellations of tightly-specified spherical clusters. A summary tree is obtained for the resulting k-cluster partition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BALL, G. H. (1965): Data analysis in the social sciences: What about the details? Proc. Fall Joint Computer Conf., Spartan Books, Washington D.C., Vol. 27 (1), 533–539.
Google Scholar
BALL, G. H. and HALL, D. J. (1967): A clustering technique for summarizing multivariate data. Behavioral Science, Vol. 12, 153–155.
Article Google Scholar
BEALE, E. M. L. (1969): Euclidean cluster analysis. Bull. I. S. I., Vol. 43 (2), 92–94.
Google Scholar
DIDAY, E., and SIMON, J. C. (1976): Cluster analysis, in Fu, K. S. (Ed): Digital pattern recognition. Springer, Berlin, 47–94.
Chapter Google Scholar
FORGEY, E. W. (1965): Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics, Vol. 21, 768–769.
Google Scholar
GOWER, J. C. (1971): A general coefficient of similarity and some of its properties. Biometrics, Vol. 27, 857–874.
Article Google Scholar
JANCEY, R. C. (1966): Multidimensional group analysis. Austral. J. Botany, Vol. 14 (1), 127–130.
Article Google Scholar
KASS, G. V. (1980): An exploratory technique for investigating large quantities of categorical data. Applied Statistics, Vol. 29, 119–127.
Article Google Scholar
KAUFMAN, L. and ROUSSEEUW, P. J. (1960): Finding groups in data. Wiley, New York.
Google Scholar
MacQUEEN, J. (1967): Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp., Vol. I, 281–297.
MathSciNet Google Scholar
THORNDIKE, R. L. (1953): Who belongs in the family. Psychometrika, Vol. 18, 267–276.
Article Google Scholar
WISHART, D. (1970): Some problems in the theory and application of the methods of numerical taxonomy. Ph.D. dissertation, University of St. Andrews.
Google Scholar
WISHART, D. (1978): Treatment of missing values in cluster analysis. Proc. Compstat 1978, Physica-Verlag, Wien, 281–287.
Google Scholar
WISHART, D. (1984): Clustan Benutzerhandbuch. Gustav Fischer Verlag, Stuttgart, 46–54.
MATH Google Scholar
WISHART, D. (1986): Hierarchical cluster analysis with messy data, in: Gaul, Schader, (Eds.): Classification as a Tool of Research. North-Holland, Amsterdam, 453–460.
Google Scholar
WISHART, D. (1999): ClustanGraphics Primer. Clustan, Edinburgh, 37–38.
Google Scholar
WISHART, D. (2002): Clustan Professional User Guide. Clustan, Edinburgh (in preparation).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Management, University of St. Andrews, St. Katharine’s West, The Scores, St. Andrews, Fife, KY16 9AL, Scotland
D. Wishart

Authors

D. Wishart
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Munich School of Management Institute of Corporate Development and Organization, University of Munich, Kaulbachstraße 45/1, 80539, Munich, Germany
Manfred Schwaiger
Department of Mathematical Methods in Economics, University of Augsburg, Universitätsstraße 16, 86159, Augsburg, Germany
Otto Opitz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wishart, D. (2003). k-Means Clustering with Outlier Detection, Mixed Variables and Missing Values. In: Schwaiger, M., Opitz, O. (eds) Exploratory Data Analysis in Empirical Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55721-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-55721-7_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44183-0
Online ISBN: 978-3-642-55721-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics