Abstract
Detecting clusters (also referred to as groups or modules) of closely related objects is an important problem in data mining in general. Network modules are often defined as clusters. Partitioning-around-medoids (PAM) clustering and hierarchical clustering are often used in network applications. Partitioning-around-medoids (aka. k-medoid clustering) leads to relatively robust clusters but requires that the user specify the number k of clusters. Hierarchical clustering is attractive in network applications since (a) it does not require the specification of the number of clusters and (b) it works well when there are many singleton clusters and when cluster sizes vary greatly. But hierarchical clustering requires the user to determine how to cut branches of the resulting cluster tree. Toward this end, one can use the dynamicTreeCut method and R library. The dynamic hybrid method combines the advantages of hierarchical clustering and partitioning-around-medoids clustering. Network concepts are useful for defining cluster quality statistics (e.g., to measure the density or separability of clusters). To determine whether the cluster structure is preserved in another data sets, one can use cross-tabulation-based preservation statistics. To measure the agreement between two clusterings, one can use the Rand index and other cross-tabulation-based statistics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Carlson M, Zhang B, Fang Z, Mischel P, Horvath S, Nelson SF (2006) Gene connectivity, function, and sequence conservation: Predictions from modular yeast co-expression networks. BMC Genomics 7(7):40
Dong J, Horvath S (2007) Understanding network concepts in modules. BMC Syst Biol 1(1):24
Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):RESEARCH0036
Gargalovic PS, Imura M, Zhang B, Gharavi NM, Clark MJ, Pagnon J, Yang WP, He A, Truong A, Patel S, Nelson SF, Horvath S, Berliner JA, Kirchgessner TG, Lusis AJ (2006) Identification of inflammatory gene modules based on variations of human endothelial cell responses to oxidized lipids. Proc Natl Acad Sci USA 103(34):12741–12746
Ghazalpour A, Doss S, Zhang B, Plaisier C, Wang S, Schadt EE, Thomas A, Drake TA, Lusis AJ, Horvath S (2006) Integrating genetics and network analysis to characterize genes related to mouse weight. PloS Genet 2(2):8
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistcal learning: Data mining, inference, and prediction. Springer, New York
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Kapp AV, Tibshirani R (2007) Are clusters found in one dataset present in another dataset? Biostat 8(1):9–31
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: An introduction to cluster analysis. Wiley, New York
Langfelder P, Horvath S (2011) Fast R functions for robust correlations and hierarchical clustering. J Stat Software. In press
Langfelder P, Zhang B, Horvath S (2007) Defining clusters from a hierarchical cluster tree: The Dynamic Tree Cut library for R. Bioinformatics 24(5):719–720
Langfelder P, Luo R, Oldham MC, Horvath S (2011) Is my network module preserved and reproducible? Plos Comput Biol 7(1):e1001057
Oldham MC, Langfelder P, Horvath S (2011) Sample networks for enhancing cluster analysis of genomic data: Application to huntington’s disease. Technical Report
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40
Tibshirani R, Walther G (2005) Cluster validation by prediction strength. J Comput Graph Stat 14:511–528
Yip A, Horvath S (2007) Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinform 8(8):22
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Horvath, S. (2011). Clustering Procedures and Module Detection. In: Weighted Network Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-8819-5_8
Download citation
DOI: https://doi.org/10.1007/978-1-4419-8819-5_8
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-8818-8
Online ISBN: 978-1-4419-8819-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)