Abstract
Integrated data analysis is introduced as the intermediate level of a systems biology approach to analyse different ‘omicsrs datasets, i.e., genome-wide measurements of transcripts, protein levels or protein—protein interactions, and metabolite levels aiming at generating a coherent understanding of biological function. In this chapter we focus on different methods of correlation analyses ranging from simple pairwise correlation to kernel canonical correlation which were recently applied in molecular biology. Several examples are presented to illustrate their application. The input data for this analysis frequently originate from different experimental platforms. Therefore, preprocessing steps such as data normalisation and missing value estimation are inherent to this approach. The corresponding procedures, potential pitfalls and biases, and available software solutions are reviewed. The multiplicity of observations obtained in omics-profiling experiments necessitates the application of multiple testing correction techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Somogyi R, Sniegoski CA (1996) Modeling the complexity of genetic networks: understanding multigenic and pleiotropic regulation. Complexity 1(6): 45–63
Gygi S, Rochon Y, Franza B, Aebersold R (1999) Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 19(3): 1720–1730
Noble D (2002) Modeling the heart-from genes to cells to the whole organ. Science 295(5560) 1678–1682
Grünenfelder B, Winzeler EA (2002) Treasures and traps in genome-wide datasets: case examples from yeast. Nat Rev Genetics 3: 653–661
Shevchenko A, Jensen O, Podtelejnikov A, Sagliocco F, Wilm M, Vorm O, Mortensen P, Shevchenko A, Boucherie H, Mann M (1996) Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two dimensional gels. Proc Natl Acad Sci USA 93(25): 14440–14445
Pandey A, Mann M (2000) Proteomics to study genes and genomes. Nature 405: 837–846
Walhout A, Vidal M (2001) Protein interaction maps for model organisms. Nat Rev Mol Cell Biol 2(1): 55–62
Fiehn O, Kopka J, Dormann P, Altmann T, Trethewey R, Willmitzer L (2000) Metabolite profiling for plant functional genomics. Nat Biotechnol 18(11): 1157–1161
Roessner U, Luedemann A, Brust D, Fiehn O, Linke T, Willmitzer L, Fernie A (2001) Metabolic profiling allows comprehensive phenotyping of genetically or environmentally modified plant systems. Plant Cell 13(1): 11–29
Fernie A, Trethewey R, Krotzky A, Willmitzer L (2004) Metabolite profiling: from diagnostics to systems biology. Nat Rev Mol Cell Biol 5(9): 763–769
Klipp E, Herwig R, Kowald A, Wierling C, Lehrach H (2005) Systems biology in practice — concepts, implementation and application, chapter.3, Wiley-VCH Verlag, Weinheim, Germany, 11–17
Griffin TJ, Gygi SP, Ideker T, Rist B, Eng J, Hood L, Aebersold R (2002) complementary profiling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae. Mol Cell Proteomics 1(4): 323–333
Aitchison JD, Galitski T (2003) Inventories to insights. J Cell Biol 161(3): 465–469
Wissel C (1992) Aims and limits of ecological modelling exemplified by island theory. Ecol Model 63: 1–12
Searls D (2005) Data integration: challenges for drug discovery. Nat Rev Drug Discov 4(1): 45–58
Park P, Cao Y, Lee S, Kim J, Chang M, Hart R, Choi S (2004) Current issues for DNA microarrays: platform comparison, double linear amplification, and universal RNA reference. J Biotechnol 112(3): 225–245
Aebersold R, Hood L, Watts J (2000) Equipping scientists for the new biology. Nat Biotechnol 18(4): 359
Weinstein JN (2002) ‘Omic’ and hypothesis-driven research in the molecular pharmacology of cancer. Curr Opin Pharmacol 2: 361–365
Ge H, Liu Z, Church GM, Vidal M (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nature Genetics 29: 482–486
Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1): 25–29
The Plant Ontology Consortium (2002) The Plant Ontology Consortium and Plant Ontologies. Comp Funct Genomics 3: 137–142
Hazbun T, Malmstrom L, Anderson S, Graczyk B, Fox B, Riffle M, Sundin B, Aranda J, McDonald W, Chiu C et al. (2003) Assigning function to yeast proteins by integration of technologies. Mol Cell 12(6): 1353–1365
Wacholder S, McLaughlin JK, Silverman DT, Mandel JS (1992) Selection of controls in case-control studies. I. principles. Am J Epidemiol 135(9): 1019–1028
Repsilber D, Fink L, Jacobsen M, Bläsing O, Ziegler A (2005) Sample selection for microarray gene expression studies. Meth Info Med 44(3): 461–467
Smith JJ, Marelli M, Christmas RH, Vizeacoumar FJ, Dilworth DJ, Ideker T, Galitski T, Dimitrov K, Rachubinski RA, Aitchison JD (2002) Transcriptome profiling to identify genes involved in peroxisome assembly and function. J Cell Biol 158(2): 259–271
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 14863–14868
Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM (1999) Systematic determination of genetic network architecture. Nature Genetics 22(3): 281–285
Qiu P (2003) Recent advances in computational promoter analysis in understanding the transcriptional regulatory network. Biochem Biophys Res Commun 309(3): 495–501
Ideker T, Ozier O, Schwikowski B, Siegel AF (2002) Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18(Suppl.1): S233–S240
Kriete A, Anderson MK, Love B, Freund J, Caffrey JJ, Young MB, Sendera TJ, Magnuson SR, Braughler JM (2003) Combined histomorphometric and gene-expression profiling applied to toxicology. Genome Biol 4: R32
Weckwerth W (2003) Metabolomics in systems biology. Annu Rev Plant Biol 54: 669–689
Urbanczyk-Wochniak E, Luedemann A, Kopka J, Selbig J, Roessner-Tunali U, Willmitzer L, Fernie A (2003) Parallel analysis of transcript and metabolic profiles: a new approach in systems biology. EMBO Rep 4(10): 989–993
Nilsson J, Fioetos T, Höglund M, Fontes M (2004) Approximate geodetic distances reveal biological relevant structure in microarray data. Bioinformatics 20(6): 874–880
Scholz M, Gatzek S, Sterling A, Fiehn O, Selbig J (2004) Metabolite fingerprinting: detection of biological features by independent component analysis. Bioinformatics 20: 2447–2454
Scholz M, Kaplan F, Guy CL, Kopka J, Selbig J (2005) Nonlinear PCA: a missing data approach. Bioinformatics, Advance Access published online 18 August 2005
Gasch AP, Spellmann PT, Kao CM, Carmel-Harel O, Eisen M, Storz, Botstein D, Brown PO (2000) Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell 11: 4241–4257
Butte A, Kohane IS (2000) Mutual information relevance networks: Functional genomic clustering using pair-wise entropy measurements. Pac Symp Biocomput 5: 415–426
Steuer R, Kurths J, Daub C, Weise J, Selbig J (2002) The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics 18: S231–S240
Best DJ, Roberts DE (1975) Algorithm AS 89: The upper tail probabilities of spearman’s rho. Appl Stats 24: 377–379
Hotelling H (1936) Relation between two sets of variates. Biometrica 28: 312–377
Hardoon D, Szedmak S, Shawe-Taylor J (2003) Canonical correlation analysis; An overview with application to learning methods. Technical Report CSD-TR-03-02. Department of Computer Science, University of London, UK
Yamanishi Y, Vert JP, Kanehisa M (2003) Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. Bioinformatics 19:Suppl 1 i323–330
Kuss M, Graepel T (2003) The geometry of kernel canonical analysis. Technical Report No. 108, Max Planck Institute for Biological Cybernetics
Kanehisa M, Goto S, Kawashima S, Nakaya A (2002) The KEGG databases at GenomeNet. Nucleic Acids Res 30: 42–45
Gibbons F, Roth F (2002) Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res 12(10): 1574–1581
Daub C, Steuer R, Selbig J, Kloska S (2004) Estimating mutual information using B-spline functions’an improved similarity measure for analysing gene expression data. BMC Bioinformatics 5: 118
Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL, Somogyi R (1998) Largescale temporal gene expression mapping of central nervous system development. Proc Natl Acad Sci USA 95: 334–339
Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12): 6745–6750
Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander E, Golub T (1999) Interpreting patterns of gene expression with self-organising maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 96(6): 2907–2912
Heyer L, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome Res 9(11): 1106–1115
Michaels GS, Carr DB, Askenazi M, Fuhrman S, Wen X, Somogyi R (1998) Cluster analysis and data visualization of large-scale gene expression data. Pac Symp Biocomp 3: 42–53
Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100(16): 9440–9445
Broberg P (2005) A comparative review of estimates of the proportion unchanged genes and the false discovery rate. BMC Bioinformatics 6: 199
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comp Graph Stats 5(3): 299–314
R Development Core Team (2005) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J et al. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5: R80
MathWorks IUC (2000) MATLAB
Eichler G, Huang S, Ingber D (2003) Gene Expression Dynamics Inspector (GEDI): for integrative analysis of expression profiles. Bioinformatics 19(17): 2321–2322
Thimm O, Bläsing O, Yves Gibon OB, Nagel A, Meyer S, Krüger P, Selbig J, Müller LA, Rhee SY, Stitt M (2004) MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J 37: 914–939
Zimmermann P, Hennig L, Gruissem W (2005) Gene-expression analysis and network discovery using Genevestigator. Trends Plant Sci 10(9): 407–409
Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W (2004) GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol 136(1): 2621–2632
Breitkreutz B, Stark C, Tyers M (2003) Osprey: a network visualization system. Genome Biol 4(3): R22
Shannon P, Markiel A, Ozier O, Baliga N, Wang J, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11): 2498–2504
Daub C, Kloska S, Selbig J (2003) MetaGeneAlyse: analysis of integrated transcriptional and metabolite data. Bioinformatics 19(17): 2332–2333
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Birkhäuser Verlag/Switzerland
About this chapter
Cite this chapter
Steinfath, M., Repsilber, D., Scholz, M., Walther, D., Selbig, J. (2007). Integrated data analysis for genome-wide research. In: Baginsky, S., Fernie, A.R. (eds) Plant Systems Biology. Experientia Supplementum, vol 97. Birkhäuser Basel. https://doi.org/10.1007/978-3-7643-7439-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-7643-7439-6_13
Publisher Name: Birkhäuser Basel
Print ISBN: 978-3-7643-7261-3
Online ISBN: 978-3-7643-7439-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)