Skip to main content

Guidelines for Bioinformatics and the Statistical Analysis of Omic Data

  • Chapter
  • First Online:
Omics Approaches to Understanding Muscle Biology

Part of the book series: Methods in Physiology ((METHPHYS))

Abstract

This chapter is a resource for those designing omics experiments and those analyzing the data from such experiments. It is organized into two parts, one with a focus on bioinformatics tools and techniques, and the other with a focus on statistical analyses. It is intended to be a high-level instructional chapter for those who are interested in performing their own analyses, not a comprehensive discussion of either area. The first section discusses the bioinformatics tools and algorithms used in genomics and transcriptomics. It describes typical workflows and the tools available for performing an omic experiment and underscores the importance of both the tools being used and a clear understanding of the underlying algorithm. The second section describes general study design principles that should be taken into account before an experiment is begun. It describes some basic principles of statistical analysis and commonly used methods. It is not a comprehensive discussion of statistical theory nor does it describe more complex statistical models. The guidance of a statistician is advised for complex study designs, hypotheses, or statistical models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hood, L., & Galas, D. (2003). The digital code of DNA. Nature, 421(6921), 444–448.

    Article  PubMed  CAS  Google Scholar 

  2. Dahm, R. (2008). Discovering DNA: Friedrich Miescher and the early years of nucleic acid research. Human Genetics, 122(6), 565–581.

    Article  CAS  PubMed  Google Scholar 

  3. Levy, S. E., & Myers, R. M. (2016). Advancements in next-generation sequencing. Annual Review of Genomics and Human Genetics, 17(1), 95–115.

    Article  CAS  PubMed  Google Scholar 

  4. Reis-Filho, J. S. (2009). Next-generation sequencing. Breast Cancer Research, 11(S3), S12.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  5. Cock, P. J. A., Fields, C. J., Goto, N., Heuer, M. L., & Rice, P. M. (2010). The sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Research, 38(6), 1767–1771.

    Article  CAS  PubMed  Google Scholar 

  6. Ewing, B., Hillier, L., Wendl, M. C., & Green, P. (1998). Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research, 8(3), 175–185.

    Article  CAS  PubMed  Google Scholar 

  7. Ewing, B., & Green, P. (1998). Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Research, 8(3), 186–194.

    Article  CAS  PubMed  Google Scholar 

  8. Andrews, S. (2010). FastQC a quality control tool for high throughput sequence data. Retrieved November 25, 2018 from http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.

  9. Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet Journal, 17(1), 10.

    Article  Google Scholar 

  10. Joshi, N. A., & Fass, J. N. (2011). Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files.

    Google Scholar 

  11. Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14), 1754–1760.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Adjeroh, D., Bell, T., & Mukherjee, A. (2008). The Burrows-Wheeler transform: Data compression, suffix arrays, and pattern matching. New York: Springer.

    Book  Google Scholar 

  13. Lam, T. W., Sung, W. K., Tam, S. L., Wong, C. K., & Yiu, S. M. (2008). Compressed indexing and local alignment of DNA. Bioinformatics, 24(6), 791–797.

    Article  CAS  PubMed  Google Scholar 

  14. Li, H., et al. (2009). The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), 2078–2079.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. McKenna, A., et al. (2010). The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research, 20(9), 1297–1303.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Garrison, E., & Marth, G. (2016). Haplotype-based variant detection from short-read sequencing.

    Google Scholar 

  17. Kobayashi, M., et al. (2017). Heap: A highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data. DNA Research, 24(4), 397–405.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Tattini, L., D’Aurizio, R., & Magi, A. (2015). Detection of genomic structural variants from next-generation sequencing data. Frontiers in Bioengineering and Biotechnology, 3, 92.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Chen, K., et al. (2009). BreakDancer: An algorithm for high-resolution mapping of genomic structural variation. Nature Methods, 6(9), 677–681.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Korbel, J. O., et al. (2009). PEMer: A computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biology, 10(2), R23.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Lee, S., Hormozdiari, F., Alkan, C., & Brudno, M. (2009). MoDIL: Detecting small indels from clone-end sequencing with mixtures of distributions. Nature Methods, 6(7), 473–474.

    Article  CAS  PubMed  Google Scholar 

  22. Magi, A., Tattini, L., Pippucci, T., Torricelli, F., & Benelli, M. (2012). Read count approach for DNA copy number variants detection. Bioinformatics, 28(4), 470–478.

    Article  CAS  PubMed  Google Scholar 

  23. Magi, A., et al. (2013). EXCAVATOR: Detecting copy number variants from whole-exome sequencing data. Genome Biology, 14(10), R120.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Abyzov, A., Urban, A. E., Snyder, M., & Gerstein, M. (2011). CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Research, 21(6), 974–984.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Schröder, J., et al. (2014). Socrates: Identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads. Bioinformatics, 30(8), 1064–1072.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Karakoc, E., et al. (2012). Detection of structural variants and indels within exome data. Nature Methods, 9(2), 176–178.

    Article  CAS  Google Scholar 

  27. Earl, D., et al. (2011). Assemblathon 1: A competitive assessment of de novo short read assembly methods. Genome Research, 21(12), 2224–2241.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Iqbal, Z., Caccamo, M., Turner, I., Flicek, P., & McVean, G. (2012). De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics, 44(2), 226–232.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Nijkamp, J. F., van den Broek, M. A., Geertman, J.-M. A., Reinders, M. J. T., Daran, J.-M. G., & de Ridder, D. (2012). De novo detection of copy number variation by co-assembly. Bioinformatics, 28(24), 3195–3202.

    Article  CAS  PubMed  Google Scholar 

  30. Rausch, T., Zichner, T., Schlattl, A., Stutz, A. M., Benes, V., & Korbel, J. O. (2012). DELLY: Structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics, 28(18), i333–i339.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Layer, R. M., Chiang, C., Quinlan, A. R., & Hall, I. M. (2014). LUMPY: a probabilistic framework for structural variant discovery. Genome Biology, 15(6), R84.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Wong, K., Keane, T. M., Stalker, J., & Adams, D. J. (2010). Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly. Genome Biology, 11(12), R128.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Jeffares, D. C., et al. (2017). Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nature Communications, 8, 14061.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. English, A. C., et al. (2015). Assessing structural variation in a personal genome—towards a human reference diploid genome. BMC Genomics, 16(1), 286.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Wang, K., Li, M., & Hakonarson, H. (2010). ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research, 38(16), e164.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Sherry, S. T., et al. (2001). dbSNP: The NCBI database of genetic variation. Nucleic Acids Research, 29(1), 308–311.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. MacDonald, J. R., Ziman, R., Yuen, R. K. C., Feuk, L., & Scherer, S. W. (2014). The database of genomic variants: A curated collection of structural variation in the human genome. Nucleic Acids Research, 42(Database issue), D986–D992.

    Article  CAS  PubMed  Google Scholar 

  38. Landrum, M. J., et al. (2018). ClinVar: Improving access to variant interpretations and supporting evidence. Nucleic Acids Research, 46(D1), D1062–D1067.

    Article  CAS  PubMed  Google Scholar 

  39. Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J., & Kircher, M. (2018). CADD: Predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Research, 47(D1), D886–D894.

    Article  PubMed Central  CAS  Google Scholar 

  40. Kircher, M., Witten, D. M., Jain, P., O’Roak, B. J., Cooper, G. M., & Shendure, J. (2014). A general framework for estimating the relative pathogenicity of human genetic variants. Nature Genetics, 46(3), 310–315.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Cingolani, P., et al. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly, 6(2), 80–92.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Cingolani, P., et al. (2012). Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Frontiers in Genetics, 3, 35.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Geoffroy, V., et al. (2018). AnnotSV: An integrated tool for structural variations annotation. Bioinformatics, 34(20), 3572–3574.

    Article  CAS  PubMed  Google Scholar 

  44. Freeman, W. M., Walker, S. J., & Vrana, K. E. (1999). Quantitative RT-PCR: Pitfalls and potential. BioTechniques, 26(1), 112–125.

    Article  CAS  PubMed  Google Scholar 

  45. Bumgarner, R. (2013). Overview of DNA microarrays: Types, applications, and their future. Current Protocols in Molecular Biology, 101(1), 22–21.

    Google Scholar 

  46. Solomon, M. J., Larsen, P. L., & Varshavsky, A. (1988). Mapping protein-DNA interactions in vivo with formaldehyde: Evidence that histone H4 is retained on a highly transcribed gene. Cell, 53(6), 937–947.

    Article  CAS  PubMed  Google Scholar 

  47. Van Gelder, R. N., von Zastrow, M. E., Yool, A., Dement, W. C., Barchas, J. D., & Eberwine, J. H. (1990). Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proceedings of the National Academy of Sciences of the United States of America, 87(5), 1663–1667.

    Article  PubMed  PubMed Central  Google Scholar 

  48. Shalon, D., Smith, S. J., & Brown, P. O. (1996). A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Research, 6(7), 639–645.

    Article  CAS  PubMed  Google Scholar 

  49. Ritchie, M. E., et al. (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Gautier, L., Cope, L., Bolstad, B. M., & Irizarry, R. A. (2004). Affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics, 20(3), 307–315.

    Article  CAS  PubMed  Google Scholar 

  51. Dunning, M. J., Smith, M. L., Ritchie, M. E., & Tavare, S. (2007). Beadarray: R classes and methods for Illumina bead-based data. Bioinformatics, 23(16), 2183–2184.

    Article  CAS  PubMed  Google Scholar 

  52. Bolstad, B. M., Irizarry, R., Astrand, M., & Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2), 185–193.

    Article  CAS  PubMed  Google Scholar 

  53. Carvalho, B. S., & Irizarry, R. A. (2010). A framework for oligonucleotide microarray preprocessing. Bioinformatics, 26(19), 2363–2367.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Warnes, G. R., Bolker, B., Bonebakker, L., Gentleman, R., Huber, W., & Liaw, A. (2009). gplots: Various R programming tools for plotting data. R Packag. version 2.

    Google Scholar 

  55. Student. (1908). The probable error of a mean. Biometrika. Retreived May 07, 2016, from http://seismo.berkeley.edu/~kirchner/eps_120/Odds_n_ends/Students_original_paper.pdf.

  56. Fisher, R. A. (1919). XV.—The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52(02), 399–433.

    Article  Google Scholar 

  57. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57(1), 289–300.

    Google Scholar 

  58. Bonferroni, C. (1936). Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8, 3–62.

    Google Scholar 

  59. Schadt, E. E., Turner, S., & Kasarskis, A. (2010). A window into third-generation sequencing. Human Molecular Genetics, 19(R2), R227–R240.

    Article  CAS  PubMed  Google Scholar 

  60. Mikheyev, A. S., & Tin, M. M. Y. (2014). A first look at the Oxford Nanopore MinION sequencer. Molecular Ecology Resources, 14(6), 1097–1102.

    Article  CAS  PubMed  Google Scholar 

  61. Eisenstein, M. (2012). Oxford Nanopore announcement sets sequencing sector abuzz. Nature Biotechnology, 30(4), 295–296.

    Article  CAS  PubMed  Google Scholar 

  62. Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., & Salzberg, S. L. (2013). TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology, 14(4), R36.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  63. Trapnell, C., et al. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols, 7(3), 562–578.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Trapnell, C., et al. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology, 28(5), 511–515.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357–359.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Ferragina, P., & Manzini, G. (2001). An experimental study of a compressed index. Information Sciences, 135(1–2), 13–28.

    Article  Google Scholar 

  67. Dobin, A., et al. (2013). STAR: Ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15–21.

    Article  CAS  PubMed  Google Scholar 

  68. Kim, D., Langmead, B., & Salzberg, S. L. (2015). HISAT: A fast spliced aligner with low memory requirements. Nature Methods, 12(4), 357–360.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Pertea, M., Kim, D., Pertea, G. M., Leek, J. T., & Salzberg, S. L. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nature Protocols, 11(9), 1650–1667.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Pertea, M., Pertea, G. M., Antonescu, C. M., Chang, T.-C., Mendell, J. T., & Salzberg, S. L. (2015). StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology, 33(3), 290–295.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Frazee, A. C., Pertea, G., Jaffe, A. E., Langmead, B., Salzberg, S. L., & Leek, J. T. (2015). Ballgown bridges the gap between transcriptome assembly and expression analysis. Nature Biotechnology, 33(3), 243–246.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Wang, L., Wang, S., & Li, W. (2012). RSeQC: Quality control of RNA-seq experiments. Bioinformatics, 28(16), 2184–2185.

    Article  CAS  PubMed  Google Scholar 

  73. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L., & Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods, 5(7), 621–628.

    Article  CAS  PubMed  Google Scholar 

  74. Li, B., Ruotti, V., Stewart, R. M., Thomson, J. A., & Dewey, C. N. (2010). RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics, 26(4), 493–500.

    Article  PubMed  CAS  Google Scholar 

  75. Li, B., & Dewey, C. N. (2011). RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics, 12(1), 323.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1976). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.

    Google Scholar 

  77. Anders, S., Pyl, P. T., & Huber, W. (2015). HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics, 31(2), 166–169.

    Article  CAS  PubMed  Google Scholar 

  78. Liao, Y., Smyth, G. K., & Shi, W. (2014). featureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics, 30(7), 923–930.

    Article  CAS  PubMed  Google Scholar 

  79. Lawrence, M., et al. (2013). Software for computing and annotating genomic ranges. PLoS Computational Biology, 9(8), e1003118.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Soneson, C., Love, M. I., & Robinson, M. D. (2015). Differential analyses for RNA-seq: Transcript-level estimates improve gene-level inferences. F1000Research, 4, 1521.

    Article  PubMed  CAS  Google Scholar 

  81. Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139–140.

    Article  CAS  PubMed  Google Scholar 

  82. Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  83. Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society, Series A, 135(3), 370.

    Article  Google Scholar 

  84. Wald, A. (1945). Sequential tests of statistical hypotheses. Annals of Mathematical Statistics, 16(2), 117–186.

    Article  Google Scholar 

  85. Feng, J., Meyer, C. A., Wang, Q., Liu, J. S., Shirley Liu, X., & Zhang, Y. (2012). GFOLD: A generalized fold change for ranking differentially expressed genes from RNA-seq data. Bioinformatics, 28(21), 2782–2788.

    Article  CAS  PubMed  Google Scholar 

  86. Tarazona, S., et al. (2015). Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Research, 43(21), e140.

    PubMed  PubMed Central  Google Scholar 

  87. Toedling, J., & Huber, W. (2008). Analyzing ChIP-chip data using bioconductor. PLoS Computational Biology, 4(11), e1000227.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  88. Toedling, J., Sklyar, O., & Huber, W. (2007). Ringo – an R/bioconductor package for analyzing ChIP-chip readouts. BMC Bioinformatics, 8(1), 221.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  89. Durinck, S., et al. (2005). BioMart and bioconductor: A powerful link between biological databases and microarray data analysis. Bioinformatics, 21(16), 3439–3440.

    Article  CAS  PubMed  Google Scholar 

  90. Alexa, A., Rahnenfuhrer, J., & Lengauer, T. (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics, 22(13), 1600–1607.

    Article  CAS  PubMed  Google Scholar 

  91. Zhang, Y., et al. (2008). Model-based Analysis of ChIP-Seq (MACS). Genome Biology, 9(9), R137.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  92. Xu, S., Grullon, S., Ge, K., & Peng, W. (2014). Spatial clustering for identification of ChIP-enriched regions (SICER) to map regions of histone methylation patterns in embryonic stem cells. Methods in Molecular Biology, 1150, 97.

    Article  CAS  PubMed  Google Scholar 

  93. Hayatsu, H. (2008). Discovery of bisulfite-mediated cytosine conversion to uracil, the key reaction for DNA methylation analysis – a personal account. Proceedings of the Japan Academy. Series B, Physical and Biological Sciences, 84(8), 321–330.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Morris, T. J., et al. (2014). ChAMP: 450k chip analysis methylation pipeline. Bioinformatics, 30(3), 428–430.

    Article  CAS  PubMed  Google Scholar 

  95. Tian, Y., et al. (2017). ChAMP: Updated methylation analysis pipeline for illumina BeadChips. Bioinformatics, 33(24), 3982–3984.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Aryee, M. J., et al. (2014). Minfi: A flexible and comprehensive bioconductor package for the analysis of infinium DNA methylation microarrays. Bioinformatics, 30(10), 1363–1369.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E., & Storey, J. D. (2012). The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics, 28(6), 882–883.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Carson Sievert, P. T. I., Parmer, C., Hocking, T., Chamberlain, S., Ram, K., Corvellec, M., & Despouy, P. (2018). Create interactive web graphics via ‘plotly.js’ [R package plotly version 4.8.0]. Comprehensive R Archive Network (CRAN).

    Google Scholar 

  99. Krueger, F., & Andrews, S. R. (2011). Bismark: A flexible aligner and methylation caller for bisulfite-seq applications. Bioinformatics, 27(11), 1571–1572.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Chen, P.-Y., Cokus, S. J., & Pellegrini, M. (2010). BS Seeker: Precise mapping for bisulfite sequencing. BMC Bioinformatics, 11(1), 203.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Kreck, B., Marnellos, G., Richter, J., Krueger, F., Siebert, R., & Franke, A. (2012). B-SOLANA: An approach for the analysis of two-base encoding bisulfite sequencing data. Bioinformatics, 28(3), 428–429.

    Article  CAS  PubMed  Google Scholar 

  102. Frith, M. C., Mori, R., & Asai, K. (2012). A mostly traditional approach improves alignment of bisulfite-converted DNA. Nucleic Acids Research, 40(13), e100.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Saito, Y., Tsuji, J., & Mituyama, T. (2014). Bisulfighter: Accurate detection of methylated cytosines and differentially methylated regions. Nucleic Acids Research, 42(6), e45.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Xi, Y., & Li, W. (2009). BSMAP: Whole genome bisulfite sequence MAPping program. BMC Bioinformatics, 10(1), 232.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  105. Assenov, Y., Müller, F., Lutsik, P., Walter, J., Lengauer, T., & Bock, C. (2014). Comprehensive analysis of DNA methylation data with RnBeads. Nature Methods, 11(11), 1138–1140.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Saito, Y., & Mituyama, T. (2015). Detection of differentially methylated regions from bisulfite-seq data by hidden Markov models incorporating genome-wide methylation level distributions. BMC Genomics, 16(Suppl 12), S3.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  107. Song, Q., Decato, B., Hong, E. E., Zhou, M., & Fang, F. (2013). A reference methylome database and analysis pipeline to facilitate integrative and comparative epigenomics. PLoS One, 8(12), 81148.

    Article  CAS  Google Scholar 

  108. Hansen, K. D., Langmead, B., & Irizarry, R. A. (2012). BSmooth: From whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biology, 13(10), R83.

    Article  PubMed  PubMed Central  Google Scholar 

  109. Hebestreit, K., Dugas, M., & Klein, H.-U. (2013). Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics, 29(13), 1647–1653.

    Article  CAS  PubMed  Google Scholar 

  110. Wreczycka, K., Gosdschan, A., Yusuf, D., Grüning, B., Assenov, Y., & Akalin, A. (2017). Strategies for analyzing bisulfite sequencing data. Journal of Biotechnology, 261, 105–115.

    Article  CAS  PubMed  Google Scholar 

  111. Tsuji, J., & Weng, Z. (2015). Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data. Briefings in Bioinformatics, 17(6), bbv103.

    Article  Google Scholar 

  112. Eberwine, J., et al. (1992). Analysis of gene expression in single live neurons. Proceedings of the National Academy of Sciences of the United States of America, 89(7), 3010–3014.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Hwang, B., Lee, J. H., & Bang, D. (2018). Single-cell RNA sequencing technologies and bioinformatics pipelines. Experimental and Molecular Medicine, 50(8), 96.

    Article  CAS  PubMed Central  Google Scholar 

  114. Van Der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

    Google Scholar 

  115. Butler, A., Hoffman, P., Smibert, P., Papalexi, E., & Satija, R. (2018). Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature Biotechnology, 36(5), 411–420.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Afgan, E., et al. (2018). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Research, 46(W1), W537–W544.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Ashburner, M., et al. (2000). Gene ontology: Tool for the unification of biology. Nature Genetics, 25(1), 25–29.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Huang, D. W., Sherman, B. T., & Lempicki, R. A. (2009). Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research, 37(1), 1–13.

    Article  CAS  Google Scholar 

  119. Fisher, R. A. (1922). On the interpretation of χ2 from contingency tables, and the calculation of P. Journal of the Royal Statistical Society, 85(1), 87.

    Article  Google Scholar 

  120. Ludbrook, J. (2008). Analysis of 2 × 2 tables of frequencies: Matching test to experimental design. International Journal of Epidemiology, 37(6), 1430–1435.

    Article  PubMed  Google Scholar 

  121. Huang, D. W., Sherman, B. T., & Lempicki, R. A. (2009). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols, 4(1), 44–57.

    Article  CAS  Google Scholar 

  122. Falcon, S., & Gentleman, R. (2007). Using GOstats to test gene lists for GO term association. Bioinformatics, 23(2), 257–258.

    Article  CAS  PubMed  Google Scholar 

  123. Maere, S., Heymans, K., & Kuiper, M. (2005). BiNGO: A cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics, 21(16), 3448–3449.

    Article  CAS  PubMed  Google Scholar 

  124. Subramanian, A., et al. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43), 15545–15550.

    Article  CAS  Google Scholar 

  125. Lee, H. K., Braynen, W., Keshav, K., & Pavlidis, P. (2005). ErmineJ: Tool for functional analysis of gene expression data sets. BMC Bioinformatics, 6(1), 269.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  126. Al-Shahrour, F., et al. (2007). From genes to functional classes in the study of biological systems. BMC Bioinformatics, 8, 114.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  127. Nam, D., Kim, S.-B., Kim, S.-K., Yang, S., Kim, S.-Y., & Chu, I.-S. (2006). ADGO: Analysis of differentially expressed gene sets using composite GO annotation. Bioinformatics, 22(18), 2249–2253.

    Article  CAS  PubMed  Google Scholar 

  128. Nogales-Cadenas, R., et al. (2009). GeneCodis: Interpreting gene lists through enrichment analysis and integration of diverse biological information. Nucleic Acids Research, 37(Web Server issue), W317–W322.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Kanehisa, M., & Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 28(1), 27–30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Finn, R. D., et al. (2014). Pfam: The protein families database. Nucleic Acids Research, 42(Database issue), D222–D230.

    Article  CAS  PubMed  Google Scholar 

  131. Matys, V., et al. (2003). TRANSFAC: Transcriptional regulation, from patterns to profiles. Nucleic Acids Research, 31(1), 374–378.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Warde-Farley, D., et al. (2010). The GeneMANIA prediction server: Biological network integration for gene prioritization and predicting gene function. Nucleic Acids Research, 38(Web Server issue), W214–W220.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Stark, C., Breitkreutz, B.-J., Reguly, T., Boucher, L., Breitkreutz, A., & Tyers, M. (2006). BioGRID: A general repository for interaction datasets. Nucleic Acids Research, 34(Database issue), D535–D539.

    Article  CAS  PubMed  Google Scholar 

  134. Zhang, B., & Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4(1), Article17.

    Article  PubMed  Google Scholar 

  135. Langfelder, P., & Horvath, S. (2008). WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics, 9(1), 559.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  136. Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), 3047–3048.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Gregory, R., Warnes, R., Bolker, B., Bonebakker, L., Gentleman, M., Liaw, W. H. A., Lumley, T., Maechler, B., Magnusson, A., Moeller, S., Schwartz, M., & Venables, B. (2016). Various R programming tools for plotting data. R Package Version, 2(4), 1.

    Google Scholar 

  138. Walter, W., Sánchez-Cabo, F., & Ricote, M. (2015). GOplot: An R package for visually combining expression data with functional analysis. Bioinformatics, 31(17), 2912–2914.

    Article  CAS  PubMed  Google Scholar 

  139. Ghosh, D., & Poisson, L. M. (2009). “Omics” data and levels of evidence for biomarker discovery. Genomics, 93, 13–16.

    Article  CAS  PubMed  Google Scholar 

  140. Wheelock, A. M., & Wheelock, C. E. (2013). Trials and tribulations of ‘omics data analysis: Assessing quality of SIMCA-based multivariate models using examples from pulmonary medicine. Molecular BioSystems, 9, 2589.

    Article  CAS  PubMed  Google Scholar 

  141. Kraus, L. (2015). Editorial: Would you like a hypothesis with those data? Omics and the age of discovery science. Molecular Endocrinology, 29(11), 1531–1534.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  142. Vaux, D. L., Fidler, F., & Cumming, G. (2012). Replicates and repeats—What is the difference and is it significant? A brief discussion of statistics and experimental design. EMBO Reports, 13(4), 291.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Bell, G. (2016). Comment: Replicates and repeats. BMC Biology, 14, 28.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  144. Whitley, E., & Ball, J. (2002). Statistics review 4: Sample size calculations. Critical Care, 6(4), 335.

    Article  PubMed  PubMed Central  Google Scholar 

  145. Billoir, E., Navratil, V., & Blaise, B. J. (2015). Sample size calculation in metabolic phenotyping studies. Briefings in Bioinformatics, 16(5), 813–819.

    Article  PubMed  Google Scholar 

  146. Urdan, T. C. (2010). Statistics in plain English (3rd ed.). New York: Routledge.

    Google Scholar 

  147. Pett, M. A. (1997). Nonparametric statistics for health care research: Statistics for small samples and unusual distributions. Thousand Oaks, CA: Sage.

    Google Scholar 

  148. Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society. Series B, 64(Part 3), 479–498.

    Article  Google Scholar 

  149. Feise, R. J. (2002). Do multiple outcome measures require p-value adjustment? BMC Medical Research Methodology, 2, 8.

    Article  PubMed  PubMed Central  Google Scholar 

  150. Chen, S. Y., Feng, Z., & Yi, X. (2017). A general introduction to adjustment for multiple comparisons. Journal of Thoracic Disease, 9(6), 1725–1729.

    Article  PubMed  PubMed Central  Google Scholar 

  151. Forshed, J. (2017). Experimental design in clinical ‘omics biomarker discovery. Journal of Proteome Research, 16, 3954–3960.

    Article  CAS  PubMed  Google Scholar 

  152. Guyatt, G., Jaeschke, R., Heddle, N., Cook, D., Shannon, H., & Walter, S. (1995). Basic statistics for clinicians: 1. Hypothesis testing. CMAJ, 152(1), 27–32.

    CAS  PubMed  PubMed Central  Google Scholar 

  153. Guyatt, G., Jaeschke, R., Heddle, N., Cook, D., Shannon, H., & Walter, S. (1995). Basic statistics for clinicians: 2. Interpreting study results: Confidence intervals. CMAJ, 152(2), 169–173.

    CAS  PubMed  PubMed Central  Google Scholar 

  154. Guyatt, G., Walkter, S., Shannon, H., Cook, D., Jaeschke, R., & Heddle, N. (1995). Basic statistics for clinicians: 4. Correlation and regression. CMAJ, 152(4), 497–504.

    CAS  PubMed  PubMed Central  Google Scholar 

  155. Hanley, J. A., & Moodie, E. E. M. (2011). Sample size, precision and power calculations: A unified approach. Journal of Biometrics and Biostatistics, 2, 5.

    Article  Google Scholar 

  156. Ioannidis, J. P. A., Tarone, R., & McLaughlin, J. K. (2011). The false-positive to false-negative ratio in epidemiologic studies. Epidemiology, 22(4), 450–456.

    Article  PubMed  Google Scholar 

  157. Jarschke, R., Guyatt, G., Shannon, H., Walter, S., Cook, D., & Heddle, N. (1995). Basic statistics for clinicians: 3. Assessing the effects of treatment: Measures of association. CMAJ, 152(3), 351–357.

    Google Scholar 

  158. Mazzocchi, F. (2015). Could big data be the end of theory in science? A few remarks on the epistemology of data-driven science. EMBO Reports, 16(10), 1250–1255.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  159. Rajasundaram, D., & Selbig, J. (2016). More effort — More results: Recent advances in integrative ‘omics’ data analysis. Current Opinion in Plant Biology, 30, 57–61.

    Article  CAS  PubMed  Google Scholar 

  160. Senn, S., & Bretz, F. (2007). Power and sample size when multiple endpoints are considered. Pharmaceutical Statistics, 6, 161–170.

    Article  PubMed  Google Scholar 

  161. Signe, A., Esteban, F. J., Stavreus-Evers, A., Simon, C., Giudice, L., Lessey, B. A., Horcajadas, J. A., Macklon, N. S., D’Hooghe, T., Campoy, C., Fauser, B. C., Salamonsen, L. A., & Salumets, A. (2014). Guidelines for the design, analysis and interpretation of ‘omics’ data: Focus on human endometrium. Human Reproduction Update, 20(1), 12–28.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heather Gordish-Dressman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The American Physiological Society

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bhattacharya, S., Gordish-Dressman, H. (2019). Guidelines for Bioinformatics and the Statistical Analysis of Omic Data. In: Burniston, J., Chen, YW. (eds) Omics Approaches to Understanding Muscle Biology. Methods in Physiology. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-9802-9_4

Download citation

Publish with us

Policies and ethics