Clustering Procedures and Module Detection

Horvath, Steve

doi:10.1007/978-1-4419-8819-5_8

Steve Horvath²

3563 Accesses

Abstract

Detecting clusters (also referred to as groups or modules) of closely related objects is an important problem in data mining in general. Network modules are often defined as clusters. Partitioning-around-medoids (PAM) clustering and hierarchical clustering are often used in network applications. Partitioning-around-medoids (aka. k-medoid clustering) leads to relatively robust clusters but requires that the user specify the number k of clusters. Hierarchical clustering is attractive in network applications since (a) it does not require the specification of the number of clusters and (b) it works well when there are many singleton clusters and when cluster sizes vary greatly. But hierarchical clustering requires the user to determine how to cut branches of the resulting cluster tree. Toward this end, one can use the dynamicTreeCut method and R library. The dynamic hybrid method combines the advantages of hierarchical clustering and partitioning-around-medoids clustering. Network concepts are useful for defining cluster quality statistics (e.g., to measure the density or separability of clusters). To determine whether the cluster structure is preserved in another data sets, one can use cross-tabulation-based preservation statistics. To measure the agreement between two clusterings, one can use the Rand index and other cross-tabulation-based statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Carlson M, Zhang B, Fang Z, Mischel P, Horvath S, Nelson SF (2006) Gene connectivity, function, and sequence conservation: Predictions from modular yeast co-expression networks. BMC Genomics 7(7):40
Article PubMed Google Scholar
Dong J, Horvath S (2007) Understanding network concepts in modules. BMC Syst Biol 1(1):24
Article PubMed Google Scholar
Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):RESEARCH0036
Google Scholar
Gargalovic PS, Imura M, Zhang B, Gharavi NM, Clark MJ, Pagnon J, Yang WP, He A, Truong A, Patel S, Nelson SF, Horvath S, Berliner JA, Kirchgessner TG, Lusis AJ (2006) Identification of inflammatory gene modules based on variations of human endothelial cell responses to oxidized lipids. Proc Natl Acad Sci USA 103(34):12741–12746
Article PubMed CAS Google Scholar
Ghazalpour A, Doss S, Zhang B, Plaisier C, Wang S, Schadt EE, Thomas A, Drake TA, Lusis AJ, Horvath S (2006) Integrating genetics and network analysis to characterize genes related to mouse weight. PloS Genet 2(2):8
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistcal learning: Data mining, inference, and prediction. Springer, New York
Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article Google Scholar
Kapp AV, Tibshirani R (2007) Are clusters found in one dataset present in another dataset? Biostat 8(1):9–31
Article Google Scholar
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: An introduction to cluster analysis. Wiley, New York
Book Google Scholar
Langfelder P, Horvath S (2011) Fast R functions for robust correlations and hierarchical clustering. J Stat Software. In press
Google Scholar
Langfelder P, Zhang B, Horvath S (2007) Defining clusters from a hierarchical cluster tree: The Dynamic Tree Cut library for R. Bioinformatics 24(5):719–720
Article PubMed Google Scholar
Langfelder P, Luo R, Oldham MC, Horvath S (2011) Is my network module preserved and reproducible? Plos Comput Biol 7(1):e1001057
Article PubMed CAS Google Scholar
Oldham MC, Langfelder P, Horvath S (2011) Sample networks for enhancing cluster analysis of genomic data: Application to huntington’s disease. Technical Report
Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Article Google Scholar
Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40
Article Google Scholar
Tibshirani R, Walther G (2005) Cluster validation by prediction strength. J Comput Graph Stat 14:511–528
Article Google Scholar
Yip A, Horvath S (2007) Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinform 8(8):22
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Los Angeles, Los Angeles, CA, 90095-7088, USA
Steve Horvath (Professor of Human Genetics and Biostatistics)

Authors

Steve Horvath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steve Horvath .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Horvath, S. (2011). Clustering Procedures and Module Detection. In: Weighted Network Analysis. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-8819-5_8

Download citation

DOI: https://doi.org/10.1007/978-1-4419-8819-5_8
Published: 22 March 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-8818-8
Online ISBN: 978-1-4419-8819-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics