Synonyms
Bi-clustering; Co-clustering; Correlation clustering; Oriented clustering; Pattern-based clustering; Projected clustering
Definition
Cluster analysis aims at finding a set of subsets (i.e., a clustering) of objects in a data set. A meaningful clustering reflects a natural grouping of the data. In high-dimensional data, irrelevant attributes and correlated attributes make any natural grouping hardly detectable. Specialized techniques aim at finding clusters in subspaces of a high-dimensional data space.
Historical Background
While different weighting of attributes was in use since clusters were derived by hand, the problem of finding a cluster based on a subset of attributes and a specialized solution was first described 1972 by Hartigan [1]. But, triggered by modern capabilities of massive acquisition of high-dimensional data in many scientific and economic domains and the first general approaches to the problem [2, 3, 4], research focused on the problem not till 1998. The...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Hartigan JA. Direct clustering of a data matrix. J Am Stat Assoc. 1972;67(337):123–29.
Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 94–105.
Aggarwal CC, Procopiuc CM, Wolf JL, Yu PS, Park JS. Fast algorithms for projected clustering. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1999. p. 61–72.
Aggarwal CC, Yu PS. Finding generalized projected clusters in high dimensional space. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2000. p. 70–81.
Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform. 2004;1(1):24–45.
Kriegel HP, Kr¨ger P, Zimek A. Clustering high dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data (TKDD). 2009;3(1):1–58.
Kriegel HP, Kr¨ger P, Zimek A. Subspace clustering. Wiley Interdiscip Rev Data Min Knowl Disc. 2012;2(4):351–64.
Bellman R. Adaptive control processes. A guided tour. Princeton: Princeton University Press; 1961.
Beyer K, Goldstein J, Ramakrishnan R, Shaft U. When is “Nearest Neighbor” meaningful? In: Proceedings of the 7th International Conference on Database Theory; 1999. p. 217–35.
Houle ME, Kriegel HP, Kr¨ger P, Schubert E, Zimek A. Can shared-neighbor distances defeat the curse of dimensionality? In: Proceedings of the 22nd International Conference on Scientific and Statistical Database Management; 2010. p. 482–500.
Achtert E, B¨hm C, David J, Kr¨ger P, Zimek A. Global correlation clustering based on the Hough transform. Stat Anal Data Min. 2008;1(3):111–27.
Achtert E, B¨hm C, Kriegel HP, Kr¨ger P, Zimek A. Deriving quantitative models for correlation clusters. In: Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining; 2006. p. 4–13.
Zimek A, Vreeken J. The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Mach Learn. 2013;98(1–2):121–55.
Sim K, Gopalkrishnan V, Zimek A, Cong G. A survey on enhanced subspace clustering. Data Min Knowl Disc. 2013;26(2):332–97.
Achtert E, Kriegel HP, Schubert E, Zimek A. Interactive data mining with 3D-parallel-coordinate-trees. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2013. p. 1009–12.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Kröger, P., Zimek, A. (2018). Subspace Clustering Techniques. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_607
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_607
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering