Skip to main content
Log in

Distributions of topological tree metrics between a species tree and a gene tree

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

In order to conduct a statistical analysis on a given set of phylogenetic gene trees, we often use a distance measure between two trees. In a statistical distance-based method to analyze discordance between gene trees, it is a key to decide “biologically meaningful” and “statistically well-distributed” distance between trees. Thus, in this paper, we study the distributions of the three tree distance metrics: the edge difference, the path difference, and the precise K interval cospeciation distance, between two trees: First, we focus on distributions of the three tree distances between two random unrooted trees with n leaves (\(n \ge 4\)); and then we focus on the distributions the three tree distances between a fixed rooted species tree with n leaves and a random gene tree with n leaves generated under the coalescent process with the given species tree. We show some theoretical results as well as simulation study on these distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Allen, B., Steel, M. (2001). Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics, 5(1), 1–15.

  • Arnaoudova, E., Haws, D., Huggins, P., Jaromczyk, J. W., Moore, N., Schardl, C., et al. (2010). Statistical phylogenetic tree analysis using differences of means. Frontier Psychiatry, 1(47).

  • Betancur, R., Li, C., Munroe, T., Ballesteros, J., Ortí, G. (2013). Addressing gene tree discordance and non-stationarity to resolve a multi-locus phylogeny of the flatfishes (teleostei: Pleuronectiformes). Systematic Biology,. doi:10.1093/sysbio/syt039.

  • Bollback, J., Huelsenbeck, J. (2009). Parallel genetic evolution within and between bacteriophage species of varying degrees of divergence. Genetics, 181(1), 225–234.

  • Brito, P., Edwards, S. (2009). Multilocus phylogeography and phylogenetics using sequence-based markers. Genetica, 135, 439–455.

  • Brodal, G., Fagerberg, R., Pedersen, C. N. (2001). Computing the quartet distance between evolutionary trees in time nlog2n. Algorithmica, 731–742.

  • Carling, M., Brumfield, R. (2008). Integrating phylogenetic and population genetic analyses of multiple loci to test species divergence hypotheses in passerina buntings. Genetics, 178, 363–377.

  • Carstens, B. C., Knowles, L. L. (2007). Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from melanoplus grasshoppers. Systematic Biology, 56, 400–411.

  • Coons, J. Rusinko, J. (2014). Combinatorics of k-interval cospeciation for cophylogeny. http://arxiv.org/pdf/1407.6605.pdf (preprint)

  • Dasgupta, B., He, X., Jiang, T., Li, M., Tromp, J., Zhang, L. (1997). On computing the nearest neighbor interchange distance. In Proceedings of DIMACS Workshop on Discrete Problems with Medical Applications (pp. 125–143) (press).

  • Degnan, J., Salter, L. (2005a). Gene tree distribtutions under the coalescent process. Evolution, 59(1), 24–37.

  • Degnan, J. H., Salter, L. A. (2005b). Gene tree distributions under the coalescent process. Evolution, 59, 24–37.

  • Edwards, S. (2009). Is a new and general theory of molecular systematics emerging? Evolution, 63, 1–19.

    Article  Google Scholar 

  • Edwards, S., Liu, L., Pearl, D. (2007). High-resolution species trees without concatenation. Proceedings of the National Academy of Sciences USA, 104, 5936–5941.

  • Graham, M., Kennedy, J. (2010). A survey of multiple tree visualisation. Information Visualization, 9, 235–252.

  • Heled, J., Drummond, A. (2011). Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution, 27(3), 570–580.

  • Hickey, G., Dehne, F., Rau-Chaplin, A., Blouin, C. (2008). SPR distance computation for unrooted trees. Evolutionary Bioinformatics Online, 4, 17–27.

  • Hillis, D. M., Heath, T. A., St. John, K. (2005). Analysis and visualization of tree space. Systematic Biology, 54(3), 471–482.

  • Holmes, S. (2007). Statistical Approach to Tests Involving Phylogenies. New York: Oxford University Press.

    MATH  Google Scholar 

  • Huggins, P., Owen, M., Yoshida, R. (2012). First steps toward the geometry of cophylogeny. In The Proceedings of the Second CREST-SBM International Conference “Harmony of Gröbner Bases and the Modern Industrial Society” (pp. 99–116).

  • Maddison, W. P. (1997). Gene trees in species trees. Systematic Biology, 46(3), 523–536.

    Article  Google Scholar 

  • Maddison, W. P., Knowles, L. L. (2006). Inferring phylogeny despite incomplete lineage sorting. Systematic Biology, 55, 21–30.

  • Maddison, W. P. Maddison, D. R. (2011). Mesquite: a modular system for evolutionary analysis. version 2.75.

  • Mossel, E., Roch, S. (2010). Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(1), 166–171.

  • Pamilo, P., Nei, M. (1988). Relationships between gene trees and species trees. Molecular Biology and Evolution, 5, 568–583.

  • Paradis, E., Claude, J., Strimmer, K. (2004). APE: analyses of phylogenetics and evolution in R language. Bioinformatics, 20, 289–290.

  • Robinson, D. F., Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical Biosciences, 53, 131–147.

  • Rosenberg, N. (2002). The probability of topological concordance of gene trees and species trees. Theoretical Population Biology, 61, 225–247.

    Article  MATH  Google Scholar 

  • Rosenberg, N. A. (2003). The shapes of neutral gene genealogies in two species: probabilities of monophyly, paraphyly, and polyphyly in a coalescent model. Evolution, 57, 1465–1477.

    Article  Google Scholar 

  • RoyChoudhury, A., Felsenstein, J., Thompson, E. A. (2008). A two-stage pruning algorithm for likelihood computation for a population tree. Genetics, 180, 1095–1105.

  • Semple, C. Steel, M. (2003). Phylogenetics, vol. 24 of Oxford Lecture Series in mathematics and its applications. Oxford: Oxford University Press.

  • Steel, M., Penny, D. (1993). Distributions of tree comparison metrics-some new results. Systematic Biology, 42(2), 126–141.

  • Takahata, N. (1989). Gene genealogy in 3 related populations: consistency probability between gene and population trees. Genetics, 122, 957–966.

    Google Scholar 

  • Takahata, N., Nei, M. (1990). Allelic genealogy under overdominant and frequency-dependent selection and polymorphism of major histocompatibility complex loci. Genetics, 124, 967–978.

  • Tavaré, S. (1984). Line-of-descent and genealogical processes, and their applications in population genetics models. Theoretical Population Biology, 26, 119–164.

    Article  MathSciNet  MATH  Google Scholar 

  • Thompson, K., Kubatko, L. (2013). Using ancestral information to detect and localize quantitative trait loci in genome-wide association studies. BMC Bioinformatics, 14, 200.

  • Weyenberg, G., Huggins, P., Schardl, C., Howe, D., Yoshida, R. (2014). kdetrees: non-parametric estimation of phylogenetic tree distributions. Bioinformatics, 30(16), 2280–2287.

  • Williams, W. T., Clifford, H. T. (1971). On the comparison of two classifications of the same set of elements. Taxon, 20, 519–522.

  • Yu, Y., Warnow, T., Nakhleh, L. (2011). Algorithms for mdc-based multi-locus phylogeny inference: Beyond rooted binary gene trees on single alleles. Journal of Computational Biology, 18(11), 1543–1559.

Download references

Acknowledgments

The authors would like to thank the referees for very useful comments to improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruriko Yoshida.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xi, J., Xie, J. & Yoshida, R. Distributions of topological tree metrics between a species tree and a gene tree. Ann Inst Stat Math 69, 647–671 (2017). https://doi.org/10.1007/s10463-016-0557-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-016-0557-x

Keywords

Navigation