Abstract
The main task of phylogenetics is the reconstruction of trees, which can be understood as a representation of evolutionary relationships. Different tree representations include cladograms (branch lengths have no meaning), phylograms (branch lengths correlate with evolutionary change) and ultrametric trees (the path of branch lengths from root to tip are equal; can be used for molecular clocks). Consensus methods, such as strict consensus or majority rule, can be used to summarize the information of multiple trees. In some cases, networks might be a better choice of representation, as they allow reticulate relationships to visualize conflict or horizontal gene transfer. Many tree-building methods rely on explicit models of sequence evolution. Different models of nucleotide substitution are nested and can be derived from a general model (GTR) by restricting free parameters (e.g. different types of substitution, base frequencies). All these models assume homogeneity of evolutionary rates across alignment sites. As this assumption is often unrealistic, rate heterogeneity can be incorporated by classifying differently evolving sites according to a gamma distribution, incorporating information by constant sites or using site-heterogeneous CAT models. Amino acid models are broadly classified into empirical and mechanistic models. Empirical models are derived from large compilations of sequence databases and specify transitions probabilities of all possible amino acid changes within a matrix (e.g. DAYHOFF or WAG). Mechanistic models include assumptions about biological processes or are formulated at the codon level. If models are nested, a selection of the most appropriate model according to hierarchical likelihood ratio tests is available. Other criteria for model selection are based on information criteria such as the AIC or BIC, which penalize the inclusion of more parameters. The most widely applied methods of phylogenetic reconstruction are neighbour joining (NJ), maximum parsimony (MP), maximum likelihood (ML) and Bayesian inference (BI). Whereas NJ is a clustering method based on pairwise distances, all other methods are character based and rely on an optimality criterion. All these methods will turn any alignment into a tree, and metrics are needed to measure the support of certain branches within each tree. Bootstrapping relies on the generation of pseudoreplicates to measure branch support, whereas other measures are based on the likelihood ratio test (e.g. aLRT).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abascal F, Posada D, Zardoya R (2007) MtArt: a new model of amino acid replacement for arthropoda. Mol Biol Evol 24:1–5
Adachi J, Hasegawa M (1996) Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol 42:459–468
Adachi J, Waddell PJ, Martin W, Hasegawa M (2000) Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J Mol Evol 50:348–358
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Proceedings of the 2nd international symposium on Information Theory. Budapest, p 267–281
Alfaro ME, Zoller S, Lutzoni F (2003) Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol Biol Evol 20:255–266
Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55:539–552
Anisimova M, Gil M, Dufayard J-F, Dessimoz C, Gascuel O (2011) Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol 60:685–699
Anisimova M, Liberles DA, Philippe H, Provan J, Pupko T, von Haeseler A (2013) State-of the art methodologies dictate new standards for phylogenetic analysis. BMC Evol Biol 13:161
Antoniak C (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 2:1152–1174
Barry D, Hartigan JA (1987) Statistical analysis of hominoid molecular evolution. Stat Sci 2:191–207
Baum BR (1992) Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41:3–10
Benton MJ, Donoghue PCJ, Asher R (2009) Calibrating and constraining molecular clocks. In: Hedges SB, Kumar S (eds) The timetree of life. Oxford University Press, Oxford, pp 35–86
Bininda-Emonds ORP (2004) The evolution of supertrees. Trends Ecol Evol 19:315–322
Blanquart S, Lartillot N (2006) A Bayesian compound stochastic process for modeling nonstationary and nonhomogeneous sequence evolution. Mol Biol Evol 23:2058–2071
Blanquart S, Lartillot N (2008) A site- and time-heterogeneous model of amino acid replacement. Mol Biol Evol 25:842–858
Briggs DEG (2015) The cambrian explosion. Curr Biol 25:R864–R868
Brinkmann H, van der Giezen M, Zhou Y, de Raucourt GP, Philippe H (2005) An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst Biol 54:743–757
Camin JH, Sokal RR (1965) A method for deducing branching sequences in phylogeny. Evolution 19:311–326
Darriba D, Taboada GL, Doallo R, Posada D (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27:1164–1165
Darriba D, Taboada GL, Doallo R, Posada D (2012) jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9:772–772
Dayhoff M, Eck R, Park C (1972) A model of evolutionary change in proteins. In: Dayhoff M (ed) Atlas of protein sequence and structure, vol 5. National Biomedical Research Foundation, Washington, DC, pp 89–99
Dayhoff M, Schwarz R, Orcutt B (1978) A model of evolutionary change in proteins. In: Dayhoff M (ed) Atlas of protein sequence and structure, vol 5, Suppl. 3. National Biomedical Research Foundation, Washington, DC, pp 345–352
Dimmic MW, Rest JS, Mindell DP, Goldstein RA (2002) rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol 55:65–73
Donoghue PCJ, Benton MJ (2007) Rocks and clocks: calibrating the tree of life using fossils and molecules. Trends Ecol Evol 22:424–431
Doolittle WF, Bapteste E (2007) Pattern pluralism and the tree of life hypothesis. Proc Natl Acad Sci U S A 104:2043–2049
Doolittle RF, Blomback B (1964) Amino-acid sequence investigations of fibrinopeptides from various mammals: evolutionary implications. Nature 202:147–152
dos Reis M, Thawornwattana Y, Angelis K, Telford Maximilian J, Donoghue Philip CJ, Yang Z (2015) Uncertainty in the timing of origin of animals and the limits of precision in molecular timescales. Curr Biol 25:2939–2950
dos Reis M, Donoghue PCJ, Yang Z (2016) Bayesian molecular clock dating of species divergences in the genomics era. Nat Rev Genet 17:71–80
Drummond AJ, Suchard MA, Xie D, Rambaut A (2012) Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29:1969–1973
Edwards AWF, Cavalli-Sforza LL (1963) The reconstruction of evolution. Heredity 18:553
Efron B (1982) The jackknife, the bootstrap and other resampling plans. CBMS-NSF regional conference series in applied mathematics. Society for Industrial and Applied Mathematics, Philadelphia
Erixon P, Svennblad B, Britton T, Oxelman B (2003) Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Syst Biol 52:665–673
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Felsenstein J (1983) Parsimony in systematics: biological and statistical issues. Annu Rev Ecol Evol Syst 14:313–333
Felsenstein J (1985) Confidence limits on phylogenies – an approach using the bootstrap. Evolution 39:783–791
Felsenstein J (2013) Inferring phylogenies. Sinauer Associates, Sunderland
Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool 20:406–416
Fitch WM (1976) Molecular evolutionary clocks. In: Ayala FJ (ed) Molecular evolution. Sinauer Associates, Sunderland, pp 160–178
Fitch WM, Margoliash E (1967a) Construction of phylogenetic trees: a method based on mutation distances as estimated from cytochrome c sequences is of general applicability. Science 155:279–284
Fitch WM, Margoliash E (1967b) A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case. Biochem Genet 1:65–71
Fourment M, Gibbs MJ (2006) PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change. BMC Evol Biol 6:1
Fryxell KJ, Moon W-J (2005) CpG mutation rates in the human genome are highly dependent on local GC content. Mol Biol Evol 22:650–658
Gillespie J (1991) The causes of molecular evolution. Oxford University Press, New York
Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725–736
Goloboff PA, Farris JS, Nixon KC (2008) TNT, a free program for phylogenetic analysis. Cladistics 24:774–786
Graur D, Martin W (2004) Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. Trends Genet 20:80–86
Gu X, Fu YX, Li WH (1995) Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. Mol Biol Evol 12:546–557
Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321
Hasegawa M, Kishino H, T-a Y (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174
Heads M (2005) Dating nodes on molecular phylogenies: a critique of molecular biogeography. Cladistics 21:62–78
Hedges SB, Blair JE, Venturi ML, Shoe JL (2004) A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol 4:2
Helaers R, Milinkovitch MC (2010) MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics. BMC Bioinformatics 11:379
Hess PN, De Moraes Russo CA (2007) An empirical test of the midpoint rooting method. Biol J Linn Soc 92:669–674
Hillis DM, Bull JJ (1993) An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol 42:182–192
Hodgkinson A, Eyre-Walker A (2011) Variation in the mutation rate across mammalian genomes. Nat Rev Genet 12:756–766
Höhna S, Landis MJ, Heath TA, Boussau B, Lartillot N, Moore BR, Huelsenbeck JP, Ronquist F (2016) RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst Biol 65:726–736
Holder M, Lewis PO (2003) Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet 4:275–284
Huelsenbeck JP (1995) Performance of phylogenetic methods in simulation. Syst Biol 44:17–48
Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754–755
Huelsenbeck JP, Bollback JP, Levine AM (2002a) Inferring the root of a phylogenetic tree. Syst Biol 51:32–43
Huelsenbeck JP, Larget B, Miller RE, Ronquist F (2002b) Potential applications and pitfalls of Bayesian inference of phylogeny. Syst Biol 51:673–688
Huerta-Cepas J, Dopazo J, Gabaldón T (2010) ETE: a python environment for tree exploration. BMC Bioinformatics 11:24
Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23:254–267
Huson DH, Richter DC, Rausch C, Dezulian T, Franz M, Rupp R (2007) Dendroscope: an interactive viewer for large phylogenetic trees. BMC Bioinformatics 8:460
Huson DH, Rupp R, Scornavacca C (2010) Phylogenetic networks. Concepts, algorithms and applications. Cambridge University Press, Cambridge
Jayaswal V, Jermiin LS, Poladian L, Robinson J (2011) Two stationary nonhomogeneous markov models of nucleotide sequence evolution. Syst Biol 60:74–86
Jia F, Lo N, Ho SYW (2014) The impact of modelling rate heterogeneity among sites on phylogenetic estimates of intraspecific evolutionary rates and timescales. PLoS One 9:e95722
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci CABIOS 8:275–282
Jukes T, Cantor C (1969) Evolution of protein molecules. In: Munro R (ed) Mammalian protein metabolism. Academic Press, New York, pp 21–132
Kass RE, Wasserman L (1995) A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J Am Stat Assoc 90:928–934
Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120
Kishino H, Miyata T, Hasegawa M (1990) Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J Mol Evol 31:151–160
Kosiol C, Holmes I, Goldman N (2007) An empirical codon model for protein sequence evolution. Mol Biol Evol 24:1464–1479
Kozlov AM, Aberer AJ, Stamatakis A (2015) ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics 31:2577–2579
Krell F-T, Cranston PS (2004) Which side of the tree is more basal? Syst Entomol 29:279–281
Kück P, Mayer C, Wägele J-W, Misof B (2012) Long branch effects distort maximum likelihood phylogenies in simulations despite selection of the correct model. PLoS One 7:e36593
Kumar S (2005) Molecular clocks: four decades of evolution. Nat Rev Genet 6:654–662
Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33:1870–1874
Lanfear R, Calcott B, Ho SYW, Guindon S (2012) PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol 29:1695–1701
Lanfear R, Calcott B, Kainer D, Mayer C, Stamatakis A (2014) Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evol Biol 14:82
Larget B, Simon D (1999) Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol Biol Evol 16:750–759
Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21:1095–1109
Lartillot N, Lepage T, Blanquart S (2009) PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25:2286–2288
Lartillot N, Rodrigue N, Stubbs D, Richer J (2013) PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst Biol 62:611–615
Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25:1307–1320
Le SQ, Lartillot N, Gascuel O (2008) Phylogenetic mixture models for proteins. Philos Trans R Soc Lond Ser B Biol Sci 363:3965–3976
Le SQ, Dang CC, Gascuel O (2012) Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol Biol Evol 29:2921–2936
Lepage T, Bryant D, Philippe H, Lartillot N (2007) A general comparison of relaxed molecular clock models. Mol Biol Evol 24:2669–2680
Letunic I, Bork P (2016) Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 44:W242–W245
Li C, Lu G, Ortà G (2008) Optimal data partitioning and a test case for ray-finned fishes (Actinopterygii) based on ten nuclear loci. Syst Biol 57:519–539
Minh BQ, Nguyen MAT, von Haeseler A (2013) Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol 30:1188–1195
Miyazawa S (2013) Superiority of a mechanistic codon substitution model even for protein sequences in phylogenetic analysis. BMC Evol Biol 13:257
Nei M, Kumar S (2000) Molecular evolution and phylogenetics. Oxford University Press, New York
Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268–274
Nixon KC, Carpenter JM (1993) On outgroups. Cladistics 9:413–426
Nylander JAA, Wilgenbusch JC, Warren DL, Swofford DL (2008) AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics. Bioinformatics 24:581–583
Page RDM (1996) Tree view: an application to display phylogenetic trees on personal computers. Compu Appli Biosci CABIOS 12:357–358
Page RD, Holmes E (1998) Molecular evolution: a phylogenetic approach. Blackwell, Osney Mead/Oxford
Parham JF, Donoghue PCJ, Bell CJ, Calway TD, Head JJ, Holroyd PA, Inoue JG, Irmis RB, Joyce WG, Ksepka DT, Patané JSL, Smith ND, Tarver JE, van Tuinen M, Yang Z, Angielczyk KD, Greenwood JM, Hipsley CA, Jacobs L, Makovicky PJ, Müller J, Smith KT, Theodor JM, Warnock RCM (2012) Best practices for justifying fossil calibrations. Syst Biol 61(2):346–359
Pattengale ND, Alipour M, Bininda-Emonds ORP, Moret BME, Stamatakis A (2009) How many bootstrap replicates are necessary? In: Batzoglou S (ed) RECOMB 2009, LNCS 5541. Springer, Berlin/Heidelberg, pp 184–200
Peterson KJ, Lyons JB, Nowak KS, Takacs CM, Wargo MJ, McPeek MA (2004) Estimating metazoan divergence times with a molecular clock. Proc Natl Acad Sci U S A 101:6536–6541
Posada D, Buckley TR (2004) Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol 53:793–808
Posada D, Crandall KA (1998) MODELTEST: testing the model of DNA substitution. Bioinformatics 14:817–818
Posada D, Crandall KA (2001) Intraspecific gene genealogies: trees grafting into networks. Trends Ecol Evol 16:37–45
Price MN, Dehal PS, Arkin AP (2010) FastTree 2 – Approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490
Ragan MA (1992) Phylogenetic inference based on matrix representation of trees. Mol Phylogenet Evol 1:53–58
Ren F, Tanaka H, Yang Z (2005) An empirical examination of the utility of codon-substitution models in phylogeny reconstruction. Syst Biol 54:808–818
Renner SS (2005) Relaxed molecular clocks for dating historical plant dispersal events. Trends Plant Sci 10:550–558
RodrÃguez F, Oliver JL, MarÃn A, Medina JR (1990) The general stochastic model of nucleotide substitution. J Theor Biol 142:485–501
Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542
Rzhetsky A, Nei M (1992) A simple method for estimating and testing minimum-evolution trees. Mol Biol Evol 9:945
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425
Sanderson M (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14:1218
Sarich VM, Wilson AC (1973) Generation time and genomic evolution in primates. Science 179:1144–1147
Schneider A, Cannarozzi GM, Gonnet GH (2005) Empirical codon substitution matrix. BMC Bioinformatics 6:134
Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16:1114
Simmons MP, Pickett KM, Miya M (2004) How meaningful are Bayesian support values? Mol Biol Evol 21:188–199
Stamatakis A (2006) Phylogenetic models of rate heterogeneity: a high performance computing perspective. In: Proceedings of the 20th IEEE international parallel & distributed processing symposium (IPDPS2006). IEEE Computer Society Press, Washington, pp 278–286
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313
Stamatakis A, Hoover P, Rougemont J (2008) A rapid bootstrap algorithm for the RAxML web servers. Syst Biol 57:758–771
Steel M, Penny D (2000) Parsimony, likelihood, and the role of models in molecular phylogenetics. Mol Biol Evol 17:839–850
Sullivan J, Joyce P (2005) Model selection in phylogenetics. Annu Rev Ecol Syst 36:445–466
Sullivan J, Swofford D, Naylor G (1999) The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Mol Biol Evol 16:1347
Swofford D (2003) PAUP*: phylogenetic analysis using parsimony (and other methods). Sinauer Associates, Sunderland
Tavare S (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. Lectures Math Life Sci (Amer Math Soc) 17:57–86
Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691–699
Wilkinson M (1994) Common cladistic information and its consensus representation: reduced Adams and reduced cladistic consensus trees and profiles. Syst Biol 43:343–368
Yang Z (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39:306–314
Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol 11:367–372
Yang Z (2006) Computational molecular evolution. Oxford series in ecology and evolution. Oxford University Press, Oxford
Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591
Yang Z, Bielawski JP (2000) Statistical methods for detecting molecular adaptation. Trends Ecol Evol 15:496–503
Yang Z, Rannala B (2012) Molecular phylogenetics: principles and practice. Nat Rev Genet 13:303–314
Yang Z, Nielsen R, Hasegawa M (1998) Models of amino acid substitution and applications to mitochondrial protein evolution. Mol Biol Evol 15:1600–1611
Yoder AD, Yang Z (2000) Estimation of primate speciation dates using local molecular clocks. Mol Biol Evol 17:1081–1090
Zaheri M, Dib L, Salamin N (2014) A generalized mechanistic codon model. Mol Biol Evol 31:2528–2541
Zuckerkandl E, Pauling L (1962) Molecular disease, evolution and genetic heterogeneity. In: Kasaha M, Pullman B (eds) Horizons in biochemistry. Academic Press, New Yoek, pp 189–225
Zuckerkandl E, Pauling L (1965) Evolutionary divergence and convergence, in proteins. In: Bryson V, Vogel H (eds) Evolving genes and proteins. Academic Press, New Yoork, pp 441–465
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Bleidorn, C. (2017). Phylogenetic Analyses. In: Phylogenomics. Springer, Cham. https://doi.org/10.1007/978-3-319-54064-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-54064-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54062-7
Online ISBN: 978-3-319-54064-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)