Skip to main content

Abstract

Understanding adaptive patterns is especially difficult in the case of “evolutionary singularities,” i.e., traits that evolved in only one lineage in the clade of interest. New methods are needed to integrate our understanding of general phenotypic correlations and convergence within a clade when examining a single lineage in that clade. Here, we develop and apply a new method to investigate change along a single branch of an evolutionary tree; this method can be applied to any branch on a phylogeny, typically focusing on an a priori hypothesis for “exceptional evolution” along particular branches, for example in humans relative to other primates. Specifically, we use phylogenetic methods to predict trait values for a tip on the phylogeny based on a statistical (regression) model, phylogenetic signal (λ), and evolutionary relationships among species in the clade. We can then evaluate whether the observed value departs from the predicted value. We provide two worked examples in human evolution using original R scripts that implement this concept in a Bayesian framework. We also provide simulations that investigate the statistical validity of the approach. While multiple approaches can and should be used to investigate singularities in an evolutionary context—including studies of the rate of phenotypic change along a branch—our Bayesian approach provides a way to place confidence on the predicted values in light of uncertainty about the underlying evolutionary and statistical parameters.

The original version of this chapter was revised: Online Practical Material website has been updated. The erratum to this chapter is available at https://doi.org/10.1007/978-3-662-43550-2_23

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Allman JM, Martin B (2000) Evolving brains. Scientific American Library, Nueva York

    Google Scholar 

  • Arnold C, Matthews LJ, Nunn CL (2010) The 10kTrees website: a new online resource for primate phylogeny. Evol Anthropol 19:114–118

    Article  Google Scholar 

  • Barrett R, Kuzawa CW, McDade T, Armelagos GJ (1998) Emerging and re-emerging infectious diseases: the third epidemiologic transition. Annu Rev Anthropol 27:247–271

    Article  Google Scholar 

  • Barton RA (1996) Neocortex size and behavioural ecology in primates. Proc R Soc Lond (Biol) 263:173–177

    Article  CAS  Google Scholar 

  • Barton RA, Venditti C (2013) Human frontal lobes are not relatively large. PNAS 110:9001–9006

    Article  CAS  Google Scholar 

  • Cooper N, Kamilar JM, Nunn CL (2012) Longevity and parasite species richness in mammals. PLoS One

    Google Scholar 

  • Deaner RO, Isler K, Burkart J, van Schaik C (2007) Overall brain size, and not encephalization quotient, best predicts cognitive ability across non-human primates. Brain Behav Evol 70:115–124

    Article  Google Scholar 

  • Deaner RO, Nunn CL, van Schaik CP (2000) Comparative tests of primate cognition: different scaling methods produce different results. Brain Behav Evol 55:44–52

    Article  CAS  Google Scholar 

  • Diniz-Filho JAF, De Sant’ana CER, Bini LM (1998) An eigenvector method for estimating phylogenetic inertia. Evolution 52:1247–1262

    Article  Google Scholar 

  • Diniz-Filho JAF, Bini LM (2005) Modelling geographical patterns in species richness using eigenvector-based spatial filters. Global Ecol Biogeogr 14:177–185

    Article  Google Scholar 

  • Dunbar RIM (1993) Coevolution of neocortical size, group size and language in humans. Behav Brain Sci 16:681–735

    Article  Google Scholar 

  • Fagan WF, Pearson YE, Larsen EA, Lynch HJ, Turner JB, Staver H, Noble AE, Bewick S, Goldberg EE (2013) Phylogenetic prediction of the maximum per capita rate of population growth. Proc R Soc Lond (Biol) 280:20130523

    Article  Google Scholar 

  • Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125:1–15

    Article  Google Scholar 

  • Freckleton RP, Harvey PH, Pagel M (2002) Phylogenetic analysis and comparative data: a test and review of evidence. Am Nat 160:712–726

    Article  CAS  Google Scholar 

  • Garland T, Bennett AF, Rezende EL (2005) Phylogenetic approaches in comparative physiology. J Exp Biol 208:3015–3035

    Article  Google Scholar 

  • Garland T, Dickerman AW, Janis CM, Jones JA (1993) Phylogenetic analysis of covariance by computer simulation. Syst Biol 42:265–292

    Article  Google Scholar 

  • Garland T, Ives AR (2000) Using the past to predict the present: confidence intervals for regression equations in phylogenetic comparative methods. Am Nat 155:346–364

    Article  Google Scholar 

  • Garland T, Midford PE, Ives AR (1999) An introduction to phylogenetically based statistical methods, with a new method for confidence intervals on ancestral values. Am Zool 39:374–388

    Article  Google Scholar 

  • Gelman A (2004) Bayesian Data Analysis. Chapman & Hall/CRC, London/Boca Raton

    Google Scholar 

  • Grafen A (1989) The phylogenetic regression. Philos Trans R Soc Lond (Biol) 326:119–157

    Article  CAS  Google Scholar 

  • Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology., Oxford Series in Ecology and EvolutionOxford University Press, Oxford

    Google Scholar 

  • Hastings WK (1970) Monte Carlo sampling methods using Markov Chains and their applications. Biometrika 57(1):97–109

    Article  Google Scholar 

  • Hughes AL, Hughes MK (1995) Small genomes for better flyers. Nature 377:391. doi:10.1038/377391a0

    Article  CAS  PubMed  Google Scholar 

  • Jungers WL (1978) Functional significance of skeletal allometry in megaladapis in comparison to living prosimians. Am J Phys Anthropol 49:303–314

    Article  Google Scholar 

  • Kappeler PM, Silk JB (eds) (2009) Mind the gap: tracing the origins of human universals. Springer, Berlin

    Google Scholar 

  • Lieberman D (2011) The evolution of the human head. Belknap Press, Cambridge

    Google Scholar 

  • Liu J (2003) Monte Carlo strategies in scientific computing. Springer, Berlin

    Google Scholar 

  • Maddison WP, Midford PE, Otto SP (2007) Estimating a binary character’s effect on speciation and extinction. Syst Biol 56:701–710

    Article  Google Scholar 

  • Martin R (2002) Primatology as an essential basis for biological anthropology. Evol Anthropol 11:3–6

    Article  Google Scholar 

  • Martin RD (1990) Primate origins and evolution. Chapman and Hall, London

    Google Scholar 

  • Martins EP (1994) Estimating the rate of phenotypic evolution from comparative data. Am Nat 144:193–209

    Article  Google Scholar 

  • Martins EP, Hansen TF (1997) Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am Nat 149:646–667

    Article  Google Scholar 

  • McPeek MA (1995) Testing hypotheses about evolutionary change on single branches of a phylogeny using evolutionary contrasts. Am Nat 145:686–703

    Article  Google Scholar 

  • Mundry R, Nunn CL (2009) Stepwise model fitting and statistical inference: turning noise into signal pollution. Am Nat 173:119–123

    Article  Google Scholar 

  • Napier JR (1970) The roots of mankind. Smithsonian Institution Press, Washington

    Google Scholar 

  • Napier JR, Walker AC (1967) Vertical clinging and leaping—a newly recognized category of locomotor behaviour of primates. Folia Primatol 6:204–219

    Article  CAS  Google Scholar 

  • Nee S (2006) Birth-death models in macroevolution. Ann Rev Ecol Evol S 37:1–17

    Article  Google Scholar 

  • Nunn CL (2002) A comparative study of leukocyte counts and disease risk in primates. Evolution 56:177–190

    Article  Google Scholar 

  • Nunn CL (2011) The comparative approach in evolutionary anthropology and biology. University of Chicago Press, Chicago

    Book  Google Scholar 

  • Nunn CL, Gittleman JL, Antonovics J (2000) Promiscuity and the primate immune system. Science 290:1168–1170

    Article  CAS  Google Scholar 

  • Nunn CL, Lindenfors P, Pursall ER, Rolff J (2009) On sexual dimorphism in immune function. Philos Trans Roy Soc B Biol Sci 364:61–69. doi:10.1098/Rstb.2008.0148

    Article  Google Scholar 

  • Nunn CL, van Schaik CP (2002) Reconstructing the behavioral ecology of extinct primates. In: Plavcan JM, Kay RF, Jungers WL, Schaik CPv (eds) Reconstructing behavior in the fossil record. Kluwer Academic/Plenum, New York, pp 159–216

    Chapter  Google Scholar 

  • O’Hara RB, Sillanpaay MJ (2009) A review of bayesian variable selection methods: what, how and which. Bayesian Anal 4(1):85–118

    Article  Google Scholar 

  • O’Meara BC, Ane C, Sanderson MJ, Wainwright PC (2006) Testing for different rates of continuous trait evolution using likelihood. Evolution 60:922–933

    Article  Google Scholar 

  • Organ CL, Nunn CL, Machanda Z, Wrangham RW (2011) Phylogenetic rate shifts in feeding time during the evolution of Homo. Proc Natl Acad Sci USA 108:14555–14559

    Article  CAS  Google Scholar 

  • Organ CL, Shedlock AM (2009) Palaeogenomics of pterosaurs and the evolution of small genome size in flying vertebrates. Biol Lett 5:47–50

    Article  Google Scholar 

  • Organ CL, Shedlock AM, Meade A, Pagel M, Edwards SV (2007) Origin of avian genome size and structure in non-avian dinosaurs. Nature 446:180–184

    Article  CAS  Google Scholar 

  • Orme D, Freckleton R, Thomas G, Petzoldt T, Fritz S, Isaac N (2011) Caper: comparative analyses of phylogenetics and evolution in R. http://R-Forge.R-project.org/projects/caper/

  • Pagel M (1997) Inferring evolutionary processes from phylogenies. Zool Scr 26:331–348

    Article  Google Scholar 

  • Pagel M (1999) Inferring the historical patterns of biological evolution. Nature 401:877–884

    Article  CAS  Google Scholar 

  • Pagel M (2002) Modelling the evolution of continuously varying characters on phylogenetic trees: the case of hominid cranial capacity. In: MacLeod N, Forey PL (eds) Morphology, shape and phylogeny. Taylor and Francis, London, pp 269–286

    Chapter  Google Scholar 

  • Pagel M, Lutzoni F (2002) Accounting for phylogenetic uncertainty in comparative studies of evolution and adaptation. In: Lässig M, Valleriani A (eds) Biological evolution and statistical physics. Springer, Berlin, pp 148–161

    Chapter  Google Scholar 

  • Pagel M, Meade A (2007) Bayes traits (http://www.evolution.rdg.ac.uk). 1.0 edn., Reading, UK

  • Pagel MD (1994) The adaptationist wager. In: Eggleton P, Vane-Wright RI (eds) Phylogenetics and Ecology. Academic, London, pp 29–51

    Google Scholar 

  • Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290

    Article  CAS  Google Scholar 

  • Reader SM, Laland KN (2002) Social intelligence, innovation, and enhanced brain size in primates. PNAS 99:4436–4441

    Article  CAS  Google Scholar 

  • Revell L (2010) Phylogenetic signal and linear regression on species data. Methods Ecol Evol 1:319–329

    Article  Google Scholar 

  • Revell LJ (2008) On the analysis of evolutionary change along single branches in a phylogeny. Am Nat 172:140–147

    Article  Google Scholar 

  • Revell LJ (2011) Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol

    Google Scholar 

  • Rodseth L, Wrangham RW, Harrigan AM, Smuts BB, Dare R, Fox R, King B, Lee P, Foley R, Muller J, Otterbein K, Strier K, Turke P, Wolpoff M (1991) The human community as a primate society. Curr Anthropol 32:221–254

    Article  Google Scholar 

  • Rohlf FJ (2001) Comparative methods for the analysis of continuous variables: geometric interpretations. Evolution 55:2143–2160

    Article  CAS  Google Scholar 

  • Safi K, Pettorelli N (2010) Phylogenetic, spatial and environmental components of extinction risk in carnivores. Global Ecol Biogeogr 19:352–362

    Article  Google Scholar 

  • Sherwood CC, Bauernfeind AL, Bianchi S, Raghanti MA, Hof PR (2012) Human brain evolution writ large and small. In: Hofman M, Falk D (eds) Evolution of the primate brain: from neuron to behavior, vol 195. Elsevier, Amsterdam, pp 237–254

    Chapter  Google Scholar 

  • Sherwood CC, Subiaul F, Zawidzki TW (2008) A natural history of the human mind: tracing evolutionary changes in brain and cognition. J Anat 212:426–454

    Article  Google Scholar 

  • Tennie C, Call J, Tomasello M (2009) Ratcheting up the ratchet: on the evolution of cumulative culture. Philos Trans R Soc Lond (Biol) Biol Sci 364:2405–2415

    Article  Google Scholar 

  • Tooby J, DeVore I (1987) The reconstruction of hominid behavioral evolution through strategic modeling. In: Kinzey WG (ed) The evolution of human behavior: primate models. State University of New York Press, Albany, pp 183–237

    Google Scholar 

  • van Schaik CP, van Noordwijk MA, Nunn CL (1999) Sex and social evolution in primates. In: Lee PC (ed) Comparative primate socioecology. Cambridge University Press, Cambridge, pp 204–240

    Chapter  Google Scholar 

  • Wrangham RW (2009) Catching fire: how cooking made us human. Basic Books, New York

    Google Scholar 

Download references

Acknowledgments

We thank Luke Matthews, Tirthankar Dasgupta, László Zsolt Garamszegi, and two anonymous referees for helpful discussion and feedback. Joel Bray helped format the manuscript. This research was supported by the NSF (BCS-0923791 and BCS-1355902).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charles L. Nunn .

Editor information

Editors and Affiliations

Appendix: Phylogenetic Prediction for Extant and Extinct Species

Appendix: Phylogenetic Prediction for Extant and Extinct Species

  1. 1.

    Mathematical Description of Method

Consider the following regression model for \( n \) different species:

$$ y_{i} = \alpha_{0} + \theta_{1} x_{i1} + \cdots + \theta_{m} x_{im} + \in_{i} $$
(21.1)

In the above model, \( y_{i} \) is the response variable for the \( i{\text{th}} \) species and \( \varvec{x}_{\varvec{i}} = (x_{i1} , \ldots ,x_{im} ) \) are covariates associated with the \( i{\text{th}} \) species. The error terms for all species \( \varvec{ \in } = ( \in_{1} , \in_{2} , \ldots , \in_{n} ) \) follow multivariate normal distribution:

$$ \varvec{ \in }\sim \mathcal{N}(0,V\sigma^{2} ) $$

In this equation, \( {\mathbf{V}} \) is the covariance matrix structure and \( \sigma^{2} \) is the standard deviation. Ordinary linear regression usually assumes the errors are independent, identical, and normally distributed, such that the covariance matrix has the same value along the diagonal of V with off-diagonal set to zero. For biological data, however, different species will exhibit similarity because of common ancestry, which leads to positive values on the off-diagonals. Moreover, the diagonal of V may show heterogeneity if root-to-tip distances vary, as might be the case if fossils are included or when the branch lengths are based on molecular change rather than absolute dates. As noted above, it is possible to select scaling parameters that transform the branch lengths to better model the evolution of traits on a given tree topology. The parameter \( \lambda \) scales internal branches (off-diagonal elements of V) between 0 and 1; when \( \lambda = 0 \), this corresponds to no phylogenetic structure, i.e., a star phylogeny. The parameter \( \kappa \) raises all branches to the power \( \kappa \). Thus, when \( \kappa = 0 \), this corresponds to a phylogeny with equal branch lengths, as might occur when speciational change takes place.

Hence, the covariance structure \( {\mathbf{V}} \) can be crucial to comparative analyses of species values, and scaling parameters provide important insights into the evolutionary process and degree of phylogenetic signal in the data.

The objective is to select the optimal model with respect to different covariates and variance structures. Two variance structures, \( \lambda \) and \( \kappa \), are considered as scaling parameters. We aim to select covariates as well as the variance structure that best characterizes trait evolution. Meanwhile, precise estimation of different parameters (regression coefficients, \( \lambda \), \( \kappa \)) is also required. Given that only \( \lambda \) and \( \kappa \) are considered, we can rewrite the distribution of \( \varvec{ \in } \) as follows:

$$ \varvec{ \in }\sim \mathcal{N}(0,\left( {I_{V} \sigma_{\lambda }^{2} + \left( {1 - I_{V} } \right)\sigma_{\kappa }^{2} } \right)\Sigma (T,I_{V} ,\lambda ,\kappa )) $$
(21.2)

In this equation, \( I_{V} \) indicates the selection of variance structure. \( I_{V} \) will be equal to 1 for estimating \( \lambda \) and 0 for estimating \( \kappa \). The parameters \( \sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} \) are standard deviations for \( \lambda \) and \( \kappa \). Covariance matrix \( \Sigma (T,I_{V} ,\lambda ,\kappa ) \) is a function of the evolutionary tree \( T \), indicator \( I_{V} \), \( \lambda \), and \( \kappa \). Henceforth, we will use notation \( \varSigma \) to replace \( \Sigma (T,I_{V} ,\lambda ,\kappa ) \).

Using a Bayesian framework, the parameters are treated as random variables and their distribution is investigated. In order to select models, three types of parameters are included in the above model:

  1. 1.

    Parameters for tree selection \( T \). Here, we would use a large number of trees to represent uncertainty in the phylogeny that describes evolutionary relationships among the species. A posterior distribution of M trees \( \{ T_{1} , \ldots , T_{M} \} \) will be used and treated as a uniform distribution (although a single tree can also be used).

  2. 2.

    Parameters for variable selection \( \Theta _{1} = (\varvec{\gamma},\varvec{\beta}) \). This includes the indicator variable \( \varvec{\gamma}= (\gamma_{1} , \ldots ,\gamma_{m} ) \), which indicates whether a variable is included in the model. Moreover, effect size \( \varvec{\beta}= (\beta_{1} , \ldots ,\beta_{m} ) \) for each covariate is also included. The regression coefficient \( \theta_{i} = \gamma_{i} \times \beta_{i} , i = 1, 2, \ldots , m \).

  3. 3.

    Parameters for variance selection \( \Theta _{2} = (I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ) \)

Let \( \varvec{Y} \) and \( \varvec{X} \) be matrices of the response variable and m explanatory variables for n species, respectively, as given below:

$$ \varvec{Y} = \left( {y_{1} ,y_{2} , \ldots ,y_{n} } \right)^{T} ,\quad \varvec{X} = \left( {\begin{array}{*{20}c} {x_{11} } & \cdots & {x_{1m} } \\ \vdots & \ddots & \vdots \\ {x_{n1} } & \cdots & {x_{nm} } \\ \end{array} } \right) $$

Then, the joint posterior distribution for all parameters will be

$$ f(\varvec{\gamma},\varvec{ \beta },I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T|\varvec{Y},\varvec{X}) \propto p(\varvec{\gamma},\varvec{ \beta },I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T)f(\varvec{Y}|\varvec{\gamma},\varvec{ \beta },I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,\varvec{X}) $$
(21.3)

In the above equation,

  1. 1.

    \( p(\varvec{\gamma},\varvec{ \beta },I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T) \) is the prior distribution for all parameters. We assume the priors of tree selection, variable selection parameters and variance selection parameters are independent, i.e., \( p\left( {\varvec{\gamma},\varvec{\beta},I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T} \right) = p\left( T \right)p\left( {\varvec{\gamma},\varvec{\beta}} \right)p(I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T) \). We also assume prior to variable selection, \( p\left( {\varvec{\gamma},\varvec{\beta}} \right), \) to satisfy \( p\left( {\varvec{\gamma},\varvec{\beta}} \right) = p\left(\varvec{\gamma}\right)p\left( {\varvec{\beta}|\varvec{\gamma}} \right) \). \( p(\gamma ) \) follows a non-informative prior, and for each \( i \), \( \beta_{i} |\gamma_{i} = \left( {1 - \gamma_{i} } \right)N\left( {\hat{\mu }, S} \right) + \gamma_{i} N(0, \tau^{2} ) \), while \( \hat{\mu }, S, \tau^{2} \) are predefined parameters.

  2. 2.

    \( f(\varvec{Y}|\varvec{\gamma},\varvec{ \beta },I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,\varvec{X}) \) is the probability density function:

$$ f\left( {\varvec{Y} |\varvec{\gamma},\varvec{ \beta },I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,\varvec{X}} \right) = I_{V} N(\varvec{Y}|\varvec{X\theta },\sigma_{\lambda }^{2}\Sigma ) + (1 - I_{V} )N(\varvec{Y}|\varvec{X\theta },\sigma_{\kappa }^{2}\Sigma ) $$

Equation (21.3) is difficult to analyze. However, \( \varvec{\beta} \) can be integrated out, which significantly simplifies the calculation. Consequently, we only need to consider the posterior distribution \( f(\varvec{\gamma},I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T|Y,X) \). After the posterior distribution is obtained, \( f(\varvec{\beta}|\varvec{\gamma},I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,Y,X) \) follows multivariate normal distribution.

Let \( \varvec{X}\left(\varvec{\gamma}\right) \) be columns of \( \varvec{X} \) with \( \gamma_{j} = 1 \) and \( \Sigma ^{'} =\Sigma ^{ '} \left( {T, I_{V} ,\lambda , \kappa ,\sigma^{2} } \right) = (\frac{1}{{\sigma^{2} }}\varvec{X}\left(\varvec{\gamma}\right)^{T}\Sigma ^{ - 1} \varvec{X}\left(\varvec{\gamma}\right) + \frac{1}{{\tau^{2} }}\varvec{I}) \), then the posterior distribution can be simplified to:

$$ \begin{aligned} & f\left( {\varvec{\gamma},I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T |\varvec{Y},\varvec{X}} \right) = p\left( \gamma \right)p\left( {I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} } \right)p\left( T \right) \\ & \times\left( {I_{V} \frac{{\det \left( {\Sigma ^{'} } \right)^{\frac{1}{2}} }}{{\left( {\sigma_{\lambda }^{2} } \right)^{{\left( {\mathop {\sum }\nolimits\varvec{\gamma}} \right)}} \det \left(\Sigma \right)^{\frac{1}{2}} }}{ \exp }\left( { - \frac{1}{{2\sigma_{\lambda }^{2} }}A_{1} } \right) + (1 - I_{V} )\frac{{\det \left( {\Sigma ^{'} } \right)^{\frac{1}{2}} }}{{\left( {\sigma_{\kappa }^{2} } \right)^{{\left( {\mathop {\sum }\nolimits\varvec{\gamma}} \right)}} \det \left(\Sigma \right)^{\frac{1}{2}} }}{ \exp }\left( { - \frac{1}{{2\sigma_{\kappa }^{2} }}A_{2} } \right)} \right) \\ \end{aligned} $$
(21.4)

where \( A_{1} = \varvec{Y}^{'}\Sigma ^{ - 1} \varvec{Y} - \frac{1}{{\sigma_{\lambda }^{2} }}(y^{'}\Sigma ^{ - 1} \varvec{X}\left(\varvec{\gamma}\right)\Sigma ^{'} \varvec{X}\left(\varvec{\gamma}\right)^{'}\Sigma ^{ - 1} \varvec{Y}),\) and \( A_{2} = \varvec{Y}^{'}\Sigma ^{ - 1} \varvec{Y} - \frac{1}{{\sigma_{\kappa }^{2} }}(y^{'}\Sigma ^{ - 1} \varvec{X}\left(\varvec{\gamma}\right)\Sigma ^{'} \varvec{X}\left(\varvec{\gamma}\right)^{'}\Sigma ^{ - 1} \varvec{Y}). \)

The posterior distribution from Eq. (21.4) is difficult to obtain; hence, we generate posterior samples using MCMC (Liu 2003) to select the optimal model. Gibbs sampling will be used to get the posterior samples. Gibbs sampling is an algorithm that can generate a sequence of samples from the joint probability distribution of two or more random variables. In each iteration of Gibbs sampling, we use the following procedure to obtain posterior samples:

  1. 1.

    Simulate \( T_{k} \) from \( \{ T_{1} , \ldots ,T_{M} \} \);

  2. 2.

    Simulate \( \varvec{\gamma} \) from \( f(\varvec{\gamma}|I_{V} ,\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,\varvec{Y},\varvec{X}) \);

  3. 3.

    Simulate \( I_{V} \) from \( f(I_{V} |{\varvec{\upgamma}},\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,\varvec{Y},\varvec{X}) \);

  4. 4.

    Simulate \( \lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} \) from \( f(\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} |{\varvec{\upgamma}},I_{V} ,T,\varvec{Y},\varvec{X}) \);

  5. 5.

    Simulate \( \varvec{\beta} \) from \( f\left( {\varvec{\beta}|{\varvec{\upgamma}},\lambda , \kappa ,\sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} ,T,\varvec{Y},\varvec{X}} \right). \)

Since \( \varvec{\gamma} \) and \( I_{V} \) in Step 1 follow a Bernoulli distribution, the posterior sample can be directly obtained. In Step 4, \( \lambda , \kappa \) can be obtained by the Metropolis–Hasting Algorithm (Hastings 1970) and \( \sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} \) from the inverse gamma distribution. \( \varvec{\beta} \) in Step 5 follows a multivariate normal distribution.

After N posterior samples, \( \left\{ {\Theta _{1}^{\left( 1 \right)} ,\Theta _{1}^{\left( 2 \right)} , \ldots ,\Theta _{1}^{\left( N \right)} } \right\}, \left\{ {\Theta _{2}^{\left( 1 \right)} ,\Theta _{2}^{\left( 2 \right)} , \ldots ,\Theta _{2}^{\left( N \right)} } \right\} \) and \( \{ T^{(1)} , T^{(2)} , \ldots , T^{(N)} \} \) have been obtained, we are interested in which model should be selected, which can be achieved via different criteria and goals:

  1. 1.

    Model with the highest posterior probability

Let \( P\left( {M_{i} |\varvec{X}, \varvec{Y}} \right), i = 1,2 \ldots ,2^{m + 1} \) be the posterior probability of \( i{\text{th}} \) candidate models. \( P\left( {M_{i} |\varvec{X}, \varvec{Y}} \right) \) is given by the percentage of model \( i \) in the posterior samples. The one with the highest posterior probability can be selected as the optimal model:

$$ M_{*} = {\text{argmax}}_{i} P(M_{i} |\varvec{X},\varvec{Y}) $$
  1. 2.

    Inclusion probability for variables (model selection)

The inclusion probability of \( j{\text{th}} \) variable can be defined as \( P(\gamma_{j} = 1|X, Y ) \), which is a marginal probability across all posterior samples. This probability can be estimated by \( P\left( {\gamma_{j} = 1 |X, Y } \right) = \frac{{\mathop {\sum }\nolimits \gamma_{j}^{(k)} }}{N} \)

  1. 3.

    Probability of the variance structure

The probability of \( \lambda \) model and \( \kappa \) model, \( P(I_{V} = 1|\varvec{X}, \varvec{Y} ) \), can be obtained through \( P\left( {I_{V} = 1 |\varvec{X}, \varvec{Y} } \right) = \frac{{\mathop {\sum }\nolimits I_{V}^{(k)} }}{N} \)

  1. 4.

    Inference on regression coefficients

For a specific candidate model \( M_{i} \), the inference on parameters for \( M_{i} \) can be directly obtained from posterior samples for \( M_{i} \). Moreover, estimation of effect size in general for a certain covariates can be obtained through Bayesian model averaging (BMA) (O’Hara and Sillanpaay 2009). For example, \( \beta_{i} \), which is effect size for the \( i{\text{th}} \) covariates, can be estimated as posterior mean:

$$ \hat{\beta }_{i} = \mathop \sum \limits_{k} P\left( {M_{k} |\varvec{X}, \varvec{Y}} \right)E_{{M_{k} }} (\beta_{i} ) $$

where \( E_{{M_{k} }} (\beta_{i} ) \) is the average of posterior sample \( \beta_{i} \) for \( M_{k} \) model. The above estimator is actually the general mean of posterior sample for \( \beta_{i}^{(k)} \). So we can use

$$ {\text{Var}}\left( {\hat{\beta }_{i} } \right) = {\text{Var}}(\beta_{i}^{(k)} ) $$

as the estimator for the variance of \( \hat{\beta }_{i} \).

  • Model Checking

Bayesian model checking (Gelman 2004) can be used to check whether the model is consistent with the data. Consider data \( \varvec{Y},\varvec{X} \) and corresponding posterior samples, \( \left\{ {\Theta _{1}^{\left( 1 \right)} ,\Theta _{1}^{\left( 2 \right)} , \ldots ,\Theta _{1}^{\left( N \right)} } \right\}, \left\{ {\Theta _{2}^{\left( 1 \right)} ,\Theta _{2}^{\left( 2 \right)} , \ldots ,\Theta _{2}^{\left( N \right)} } \right\} \) and \( \{ T^{(1)} , T^{(2)} , \ldots , T^{(N)} \} \). Under the assumption of a linear model, we can use each posterior sample to generate one predicted (i.e., “fake”) \( \varvec{Y}^{(k)} \) given \( \{ \varvec{X}, T^{\left( k \right)} ,\Theta _{1}^{\left( k \right)} ,\Theta _{2}^{(k)} \} \) in the following way:

$$ \varvec{Y}^{(k)} \sim N\left( {\varvec{X}\theta^{\left( k \right)} ,\left( {I_{V}^{\left( k \right)} \left( {\sigma_{\lambda }^{2} } \right)^{\left( k \right)} + \left( {1 - I_{V}^{\left( k \right)} } \right)\left( {\sigma_{\kappa }^{2} } \right)^{\left( k \right)} } \right)\Sigma ^{{\left( {\text{k}} \right)}} } \right) $$

where \( \Sigma ^{\left( k \right)} =\Sigma (T^{\left( k \right)} , I_{V}^{\left( k \right)} ,\lambda^{\left( k \right)} ,\kappa^{(k)} ) \). So for each posterior sample, one fake \( \varvec{Y}^{(k)} \) can be obtained. A predefined function \( z^{(k)} = f(\varvec{Y}^{(k)} ) \) with \( \{ \varvec{Y}^{(k)} , = 1,2, \ldots ,N\} \) can be obtained and compared to \( z_{C} = f(\varvec{Y}) \) obtained through real data. With comparison between \( \{ z^{(k)} \} \) and \( z_{C} \), the validity of the model is evaluated.

The logic of model checking is that if the model is valid, then the generated fake \( {\mathbf{Y}} \)s should be statistically similar to the true observed \( \varvec{Y} \). The choice of function \( f \) depends on the dataset and model we have used. However, there are several commonly used \( f \) functions, e.g., variance (\( z^{(k)} = {\text{var}}(\varvec{Y}^{(k)} ) \)) and median (\( z^{(k)} = {\text{median}}(\varvec{Y}^{(k)} ) \)). Each time, we check \( z_{C} \) against the distribution of \( \{ z^{(k)} \} \) and two-sided p-values will be identified. If p-value is smaller than 0.05, we will conclude that the data are not consistent with the model.

Prediction for an unknown response (Gelman 2004) for a species is also available in this Bayesian framework, for example, if one wishes to predict a value for a species that has not yet been studied or to investigate whether a particular species deviates from expectations of the evolutionary model (Organ et al. 2011). Consider species \( n_{1} \) with known tree \( T_{\text{new}} \) and explanatory variable \( \varvec{X}_{\text{new}} \), but response variable \( \varvec{Y}_{\text{new}} \) is missing. Assume posterior samples, \( \left\{ {\Theta _{1}^{\left( 1 \right)} ,\Theta _{1}^{\left( 2 \right)} , \ldots ,\Theta _{1}^{\left( N \right)} } \right\}, \left\{ {\Theta _{2}^{\left( 1 \right)} ,\Theta _{2}^{\left( 2 \right)} , \ldots ,\Theta _{2}^{\left( N \right)} } \right\} \) and \( \left\{ {T^{\left( 1 \right)} , T^{\left( 2 \right)} , \ldots , T^{\left( N \right)} } \right\} \) are obtained, the joint distribution of \( \left( {\varvec{Y}^{\left( k \right)} ,\varvec{Y}_{\text{new}}^{\left( k \right)} } \right)^{T} \) given \( \varvec{X}_{\text{new}} ,T_{\text{new}} ,\Theta _{1}^{\left( k \right)} ,\Theta _{2}^{\left( k \right)} \) will be:

$$ \left( {\varvec{Y}^{\left( k \right)} ,\varvec{Y}_{\text{new}}^{\left( k \right)} } \right) \sim \mathcal{N}\left( {\left( {\varvec{X},\varvec{X}_{\text{new}} } \right)^{T} \theta^{\left( k \right)} ,\left( {I_{V}^{\left( k \right)} \left( {\sigma_{\lambda }^{2} } \right)^{\left( k \right)} + \left( {1 - I_{V}^{\left( k \right)} } \right)\left( {\sigma_{\kappa }^{2} } \right)^{\left( k \right)} } \right)\Sigma \left( {T^{(k)} \mathop \cup \nolimits T_{\text{new}} ,I_{V}^{\left( k \right)} ,\lambda^{\left( k \right)} ,\kappa^{(k)} } \right)} \right) $$

for each posterior sample. Let \( \Sigma _{\text{new}}^{(k)} =\Sigma \left( {T_{\text{new}} ,I_{V}^{\left( k \right)} ,\lambda^{\left( k \right)} ,\kappa^{(k)} } \right) \), then the covariance matrix for combined tree will satisfy:

$$ \Sigma \left( {T^{\left( k \right)} \mathop \cup \nolimits T_{\text{new}} ,I_{V}^{\left( k \right)} ,\lambda^{\left( k \right)} ,\kappa^{\left( k \right)} } \right) = \left( {\begin{array}{*{20}c} {\Sigma ^{{\left( {\text{k}} \right)}} } & {\Sigma _{12}^{\left( k \right)} } \\ {\Sigma _{21}^{\left( k \right)} } & {\Sigma _{\text{new}}^{\left( k \right)} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {\Sigma ^{{\left( {\text{k}} \right)}} } & {\Sigma _{12}^{\left( k \right)} } \\ {\Sigma _{21}^{\left( k \right)} } & {\Sigma _{22}^{\left( k \right)} } \\ \end{array} } \right) $$

Since we have already observed y, the distribution of \( y_{\text{new}}^{\left( k \right)} \) can be obtained through a conditional normal distribution:

$$ \varvec{Y}_{\text{new}}^{(k)} |\varvec{Y}\sim N(\overline{{\mu^{\left( k \right)} }} ,\overline{{\Sigma ^{{({\text{k}})}} }} ) $$

while

$$ \overline{{\mu^{\left( k \right)} }} = \varvec{X}_{\text{new}} \theta^{(k)} +\Sigma _{21}^{\left( k \right)} \left( {\Sigma _{11}^{{\left( {\text{k}} \right)}} } \right)^{ - 1} (\varvec{Y} - \varvec{X}\theta^{(k)} ) $$
$$ \overline{{\Sigma ^{{({\text{k}})}} }} =\Sigma _{22}^{\left( k \right)} -\Sigma _{21}^{\left( k \right)} \left( {\Sigma _{22}^{\left( k \right)} } \right)^{ - 1}\Sigma _{12}^{\left( k \right)} $$

So for each posterior sample, one simulated \( \varvec{Y}_{\text{new}}^{(k)} \) can be obtained. Then, we can use the median and variance of predictive draws \( \{ \varvec{Y}_{\text{new}}^{(k)} , k = 1, 2, \ldots ,N\} \) to make predictions for values of the response variable in the new species. If the observed value for the species falls outside of, for example, the 95 % credible interval of predictions, one might infer that an exceptional amount of evolutionary change has occurred.

  1. 2.

    Simulation Test of Method Implemented in BayesModelS

We use simulated data to evaluate the performance of BayesModelS, focusing on estimation of parameters (but not prediction). Comparisons between our procedure and stepwise regression were conducted. For each dataset, we simulated predictor variables \( \varvec{X} \) and response variable \( \varvec{Y} \) with known associations among the variables on a single phylogeny taken from a posterior distribution of 100 phylogenies for 87 primate species. The variables for each species are independently and identically distributed according to \( N(0, 1) \). For BayesModelS, we then ran analyses across 100 trees. For stepwise regression, we used a single tree, which was identical to the tree used to generate the data.

Two different sets of simulated data were used. The first dataset is used to check whether Bayesian variable selection correctly identifies the variables to include in the statistical model, as compared to stepwise regression. The inclusion posterior probability of significant and insignificant effects was also evaluated. Consider a regression model with 10 covariates. The coefficients for covariates will be assumed to follow the distribution:

$$ \beta_{i} \sim I_{S} \times \mathcal{N}\left( {\mu , \sigma_{1}^{2} } \right) + \left( {1 - I_{S} } \right) \times\mathcal {N}\left( {0, \sigma_{2}^{2} } \right), \forall i $$

while \( \mu ,\sigma_{1}^{2} ,\sigma_{2}^{2} \) are predefined as \( 1,0 .1,0 .01 \), respectively. \( I_{S} \) is an indicator of whether or not this effect is active (i.e., nonzero). The response variable \( \varvec{y} \) can be simulated from Eq. (21.1). Different dataset with \( I_{V} = 1 \), \( \lambda /\kappa = {\text{Unif}}\;[0, 1] \), and \( \sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} = 0.01, 0.02, 0.03 \) will be used.

In reality, sometimes the true regression coefficients are neither zero nor large (O’Hara and Sillanpaay 2009), as in the previous dataset. The sizes of coefficients can be tapered toward zero. In this part, we consider a regression framework similar to O’Hara and Sillanpaay (2009), with the following regression model:

$$ y_{i} = \alpha + \mathop \sum \limits_{j = 1}^{m} \theta_{j} x_{ij} + \in_{i} $$

For simulations, known values of \( \alpha = { \log }(10) \) and \( \sigma_{\lambda }^{2} , \sigma_{\kappa }^{2} = 0.01,0.02, 0.03 \) were used. The covariate values, the \( x_{ij} \)’s, were simulated independently and drawn from a standard normal distribution, \( N(0,1) \). We also assume \( m = 21 \) and \( \varvec{ \in }\sim \mathcal{N} (0, \varSigma (T, 1, 0.5, 0.5)) \) or \( \varvec{ \in }\sim\mathcal {N} (0, \varSigma (T, 0, 0.5, 0.5)) \), for the models of \( \lambda \) and \( \kappa \), respectively. The regression coefficients, \( \theta_{j} \), were generated as equal distance between \( a - bk \) and \( a + bk \), while \( a = 0, b = 0.05 \). Twenty datasets were generated for \( k = 1, 2, \ldots , 20 \).

We used several performance measures to evaluate BayesModelS. We checked whether BayesModelS can successfully identify the correct model, identified as the model with the highest posterior probability. For the stepwise regression, the optimal model was chosen using both forward and backward stepwise procedures. Repeated simulations were conducted to check the percentage of time the two methods identify the correct model.

Moreover, median of inclusion probability for each covariate in Bayesian Model Selection was also evaluated. This is compared to the percentage of inclusion for each covariate of stepwise regression through repeated simulations. We do the following for 500 times. Each time, we use a tree to generate data. Then, we use Bayesian method and stepwise regression to estimate the correct model for these data. Since we know the true model, we know whether this is right or wrong for the two methods. We assess the statistical performance of BayesModelS and stepwise regression from this set of results.

The percentage of time each method can identify correct model with the simulated data can be found in the following Fig. 21.9

Fig. 21.9
figure 9figure 9figure 9

a Percentage of time each method identifies the correct model (\( \sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} = 0.0 1 \)). b Percentage of time each method identifies the correct model (\( \sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} = 0.0 2 \)). c Percentage of time each method identifies the correct model (\( \sigma_{\lambda }^{2} ,\sigma_{\kappa }^{2} = 0.0 3 \))

In more than 90 % of the simulations, the Bayesian Model Selection procedure identified the correct model, regardless of the \( \sigma_{\lambda }^{2} /\sigma_{\kappa }^{2} \) value. It is worth noting that stepwise regression performed well when the number of significant effects is high. When the number of significant effects is low, stepwise regression performs poorly due to a high Type I error rate (Mundry and Nunn 2009).

Next, model checking and prediction with Bayesian Model Selection were conducted. One fixed sample from Data 2 was simulated with \( I_{V} = 1 \), \( \lambda = 0.5, \sigma_{\lambda }^{2} = 0.1, k = 20 \). We used four functions to check validity of the model, mean, variance, median, and range. The checking result can be found in Fig. 21.10. We find that the model is consistent with the data, which is not surprising since the data are generated from line model.

Fig. 21.10
figure 10figure 10figure 10figure 10

a Model checking for variance. b Model checking for mean. c Model checking for minimum. d Model checking for maximum. Red line indicates the actual data

Finally, we used BayesModelS to predict unknown species in a simulation context. Each time, one species was identified as “missing” and then the predict() function was used to predict this species based on the remaining 86 species. The predictive sample was compared to the true response, as shown in Fig. 21.11. We can find that the true response of most species is within 95 % confidence interval of prediction, which means Bayesian Model Selection can effectively make prediction on unknown species.

Fig. 21.11
figure 11

Prediction of 87 species with Bayesian Model Selection. Blue points are for median of predictive sample, and red points are true response. The blue squares are 50 % credible interval, and blue lines are 95 % credible interval for predictive samples. Forty of the predictions are outside the 50 % intervals (relative to expectation of 43.5), while 2 of the predictions are outside the 95 % interval (relative to expectation of 4.3)

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Nunn, C.L., Zhu, L. (2014). Phylogenetic Prediction to Identify “Evolutionary Singularities”. In: Garamszegi, L. (eds) Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43550-2_21

Download citation

Publish with us

Policies and ethics