Abstract
RNA sequencing (RNA-seq) is an exciting technique that gives experimenters unprecedented access to information on transcriptome complexity. The costs are decreasing, data analysis methods are maturing, and the flexibility that RNA-seq affords will allow it to become the platform of choice for gene expression analysis. Here, we focus on differential expression (DE) analysis using RNA-seq, highlighting aspects of mapping reads to a reference transcriptome, quantification of expression levels, normalization for composition biases, statistical modeling to account for biological variability and experimental design considerations. We also comment on recent developments beyond the analysis of DE using RNA-seq.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
‘t Hoen PA, Ariyurek Y, Thygesen HH, et al. (2008) Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res 36:e141
Ameur A, Wetterbom A, Feuk L, et al. (2010) Global and unbiased detection of splice junctions from RNA-seq data. Genome Biol 11:R34
Anders S and Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11:R106
Auer PL (2010) Statistical Design And Analysis Of Next-Generation Sequencing Data. Doctor of Philosophy, Purdue University
Auer PL and Doerge RW (2010) Statistical design and analysis of RNA sequencing data. Genetics 185:405–16
Babak T, Garrett-Engele P, Armour CD, et al. (2010) Genetic validation of whole-transcriptome sequencing for mapping expression affected by cis-regulatory variation. BMC Genomics 11:473
Binder H, Kirsten T, Loeffler M, et al. (2004) Sensitivity of Microarray Oligonucleotide Probes:  Variability and Effect of Base Composition. The Journal of Physical Chemistry B 108:18003–14
Blekhman R, Marioni JC, Zumbo P, et al. (2010) Sex-specific and lineage-specific alternative splicing in primates. Genome Res 20:180–9
Bock C, Tomazou EM, Brinkman AB, et al. (2010) Quantitative comparison of genome-wide DNA methylation mapping technologies. Nat Biotechnol 28:1106–14
Bradford JR, Hey Y, Yates T, et al. (2010) A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling. BMC Genomics 11:282
Bullard JH, Purdom E, Hansen KD, et al. (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11:94
Carvalho PC, Hewel J, Barbosa VC, et al. (2008) Identifying differences in protein expression levels by spectral counting and feature selection. Genet Mol Res 7:342–56
Churchill GA (2002) Fundamentals of experimental design for cDNA microarrays. Nat Genet 32 Suppl:490–5
Cloonan N, Forrest AR, Kolle G, et al. (2008) Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 5:613–9
De Bona F, Ossowski S, Schneeberger K, et al. (2008) Optimal spliced alignments of short sequence reads. Bioinformatics 24:i174–80
Degner JF, Marioni JC, Pai AA, et al. (2009) Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25:3207–12
Dennis G, Jr., Sherman BT, Hosack DA, et al. (2003) DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4:P3
Ferragina P and Manzini G (2000) Opportunistic data structures with applications. Annu Symp Found Comput Sci Proc 2000:390–398
Flicek P and Birney E (2009) Sense from sequence reads: methods for alignment and assembly. Nat Methods 6:S6–S12
Fu X, Fu N, Guo S, et al. (2009) Estimating accuracy of RNA-Seq and microarrays with proteomics. BMC Genomics 10:161
Griffith M, Griffith OL, Mwenifumbo J, et al. (2010) Alternative expression analysis by RNA sequencing. Nat Methods 7:843–7
Hansen KD, Brenner SE and Dudoit S (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38:e131
Hardcastle TJ and Kelly KA (2010) baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11:422
Harr B and Turner LM (2010) Genome-wide analysis of alternative splicing evolution among Mus subspecies. Mol Ecol 19 Suppl 1:228–39
Harris RA, Wang T, Coarfa C, et al. (2010) Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nat Biotechnol 28:1097–1105
Hawkins RD, Hon GC and Ren B (2010) Next-generation genomics: an integrative approach. Nat Rev Genet 11:476–86
Hu J, Coombes KR, Morris JS, et al. (2005) The importance of experimental design in proteomic mass spectrometry experiments: some cautionary tales. Brief Funct Genomic Proteomic 3:322–31
Jiang H and Wong WH (2009) Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25:1026–32
Kanehisa M and Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30
Langmead B, Hansen KD and Leek JT (2010) Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 11:R83
Langmead B, Trapnell C, Pop M, et al. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25
Levin JZ, Yassour M, Adiconis X, et al. (2010) Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat Methods 7:709–15
Li B, Ruotti V, Stewart RM, et al. (2010) RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26:493–500
Li H and Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–60
Li H, Ruan J and Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–8
Linsen SE, de Wit E, Janssens G, et al. (2009) Limitations and possibilities of small RNA digital gene expression profiling. Nat Methods 6:474–6
Lister R, Pelizzola M, Dowen RH, et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462:315–22
Liu S, Lin L, Jiang P, et al. (2011) A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species. Nucleic Acids Res 39:578–88
Lu J, Tomfohr JK and Kepler TB (2005) Identifying differential expression in multiple SAGE libraries: an overdispersed log-linear model approach. BMC Bioinformatics 6:165
Maher CA, Kumar-Sinha C, Cao X, et al. (2009) Transcriptome sequencing to detect gene fusions in cancer. Nature 458:97–101
Marioni JC, Mason CE, Mane SM, et al. (2008) RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18:1509–17
McCullagh P and Nelder JA (1989) Generalized linear models, 2nd. Chapman and Hall, London ; New York
Montgomery SB, Sammeth M, Gutierrez-Arcelus M, et al. (2010) Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464:773–7
Mortazavi A, Williams BA, McCue K, et al. (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–8
Naef F and Magnasco MO (2003) Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays. Phys Rev E Stat Nonlin Soft Matter Phys 68:011906
NCBI (2011) NCBI – Entrez Genome. http://www.ncbi.nlm.nih.gov/sites/genome Accessed October 14
Oshlack A and Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14
Ouyang Z, Zhou Q and Wong WH (2009) ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proc Natl Acad Sci USA 106:21521–6
Pan Q, Shai O, Lee LJ, et al. (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40:1413–5
Parikh A, Miranda ER, Katoh-Kurasawa M, et al. (2010) Conserved developmental transcriptomes in evolutionarily divergent species. Genome Biol 11:R35
Picardi E, Horner DS, Chiara M, et al. (2010) Large-scale detection and analysis of RNA editing in grape mtDNA by RNA deep-sequencing. Nucleic Acids Res 38:4755–67
Pickrell JK, Marioni JC, Pai AA, et al. (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464:768–72
Quackenbush J (2002) Microarray data normalization and transformation. Nat Genet 32 Suppl:496–501
Quail MA, Kozarewa I, Smith F, et al. (2008) A large genome center’s improvements to the Illumina sequencing system. Nat Methods 5:1005–10
Raha D, Wang Z, Moqtaderi Z, et al. (2010) Close association of RNA polymerase II and many transcription factors with Pol III genes. Proc Natl Acad Sci USA 107:3639–44
Robertson G, Schein J, Chiu R, et al. (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7:909–12
Robinson MD, McCarthy DJ and Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–40
Robinson MD and Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11:R25
Robinson MD and Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23:2881–7
Robinson MD and Smyth GK (2008) Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9:321–32
Robinson MD, Stirzaker C, Statham AL, et al. (2010) Evaluation of affinity-based genome-wide DNA methylation data: effects of CpG density, amplification bias, and copy number variation. Genome Res 20:1719–29
Schadt EE, Linderman MD, Sorenson J, et al. (2010) Computational solutions to large-scale data management and analysis. Nat Rev Genet 11:647–57
Simpson JT, Wong K, Jackman SD, et al. (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–23
Srivastava S and Chen L (2010) A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res 38:e170
Subramanian A, Tamayo P, Mootha VK, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 102:15545–50
Sultan M, Schulz MH, Richard H, et al. (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321:956–60
Taub M and Speed TP (2010) Methods for allocating ambiguous short-reads. Communications in information and systems 10:69–82
Trapnell C, Pachter L and Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105–11
Trapnell C, Williams BA, Pertea G, et al. (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology 28:511–515
Wang ET, Sandberg R, Luo S, et al. (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–6
Wang L, Xi Y, Yu J, et al. (2010) A statistical method for the detection of alternative splicing using RNA-seq. PLoS One 5:e8529
Wang Z, Gerstein M and Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63
White JR, Nagarajan N and Pop M (2009) Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol 5:e1000352
Wu D, Lim E, Vaillant F, et al. (2010) ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26:2176–82
Wu Z and Irizarry RA (2005) Stochastic models inspired by hybridization theory for short oligonucleotide arrays. J Comput Biol 12:882–93
Yang YH and Speed T (2002) Design issues for cDNA microarray experiments. Nat Rev Genet 3:579–88
Young MD, Wakefield MJ, Smyth GK, et al. (2010) Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol 11:R14
Zerbino DR and Birney E (2008) Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821–9
Zhang K, Li JB, Gao Y, et al. (2009) Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human. Nat Methods 6:613–8
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Young, M.D., McCarthy, D.J., Wakefield, M.J., Smyth, G.K., Oshlack, A., Robinson, M.D. (2012). Differential Expression for RNA Sequencing (RNA-Seq) Data: Mapping, Summarization, Statistical Analysis, and Experimental Design. In: RodrÃguez-Ezpeleta, N., Hackenberg, M., Aransay, A. (eds) Bioinformatics for High Throughput Sequencing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-0782-9_10
Download citation
DOI: https://doi.org/10.1007/978-1-4614-0782-9_10
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-0781-2
Online ISBN: 978-1-4614-0782-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)