Abstract
The relationship between the evolution of a set of genomes and of individual loci therein could be very complex. For example, in eukaryotic species, meiotic recombination combined with effects of random genetic drift result in loci whose genealogies differ from each other as well as from the phylogeny of the species or populations—a phenomenon known as incomplete lineage sorting, or ILS. The most common practice for inferring local genealogies of individual loci is to slide a fixed-width window across an alignment of the genomes, and infer a phylogenetic tree from the sequence alignment of each window. However, at the evolutionary scale where ILS is extensive, it is often the case that the phylogenetic signal within each window is too low to infer an accurate local genealogy. In this paper, we propose a hidden Markov model (HMM) based method for inferring local genealogies conditional on a known species tree. The method borrows ideas from the work on coalescent HMMs, yet approximates the model parameterization to focus on computationally efficient inference of local genealogies, rather than on obtaining detailed model parameters. We also show how the method is extended to cases that involve hybridization in addition to recombination and ILS. We demonstrate the performance of our method on synthetic data and one empirical data set, and compare it to the sliding-window approach that is, arguably, the most commonly used technique for inferring local genealogies.
Part of this research was conducted while RALE was funded by a training fellowship from the National Library of Medicine (Award T15LM007093; PD Lydia E. Kavraki). Furthermore, research was funded in part by NSF grants CCF-1541979 and DMS-1547433.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alfaro, M.E., Holder, M.T.: The posterior and the prior in bayesian phylogenetics. Annu. Rev. Ecol. Evol. Syst. 37, 19–42 (2006)
Boussau, B., Guéguen, L., Gouy, M.: A mixture model and a hidden Markov model to simultaneously detect recombination breakpoints and reconstruct phylogenies. Evol. Bioinf. Online 5, 67 (2009)
Heliconius Genome Consortium, et al.: Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature 487(7405), 94–98 (2012)
de Oliveira Martins, L., Leal, E., Kishino, H., Kishino, H.: Phylogenetic detection of recombination with a Bayesian prior on the distance between trees. PLoS One 3(7), e2651 (2008)
Degnan, J.H., Salter, L.A.: Gene tree distributions under the coalescent process. Evolution 59(1), 24–37 (2005)
Degnan, J., Rosenberg, N.: Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24(6), 332–340 (2009)
Durand, E.Y., Patterson, N., Reich, D., Slatkin, M.: Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28(8), 2239–2252 (2011)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Edwards, S.V., Xi, Z., Janke, A., Faircloth, B.C., McCormack, J.E., Glenn, T.C., Zhong, B., Wu, S., Lemmon, E.M., Lemmon, A.R., Leache, A.D., Liu, L., David, C.C.: Implementing and testing the multispecies coalescent model: a valuable paradigm for phylogenomics. Mol. Phylogenet. Evol. 94, 447–462 (2016)
Elgvin, T.O., Trier, C.N., Tørresen, O.K., Hagen, I.J., Lien, S., Nederbragt, A.J., Ravinet, M., Jensen, H., Sætre, G.-P.: The genomic mosaicism of hybrid speciation. Sci. Adv. 3(6), e1602996 (2017)
Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17(6), 368–376 (1981)
Fontaine, M.C., Pease, J.B., Steele, A., Waterhouse, R.M., Neafsey, D.E., Sharakhov, I.V., Jiang, X., Hall, A.B., Catteruccia, F., Kakani, E., Mitchell, S.N., Wu, Y.-C., Smith, H.A., Love, R.R., Lawniczak, M.K., Slotman, M.A., Emrich, S.J., Hahn, M.W., Besansky, N.J.: Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science 347(6217), 1258524 (2015)
Hahn, M.W., Nakhleh, L.: Irrational exuberance for resolved species trees. Evolution 70(1), 7–17 (2016)
Hein, J., Schierup, M.H., Wiuf, C.: Gene Genealogies, Variation and Evolution. Oxford University Press, Oxford (2005)
Heled, J., Drummond, A.J.: Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27(3), 570–580 (2010)
Hobolth, A., Christensen, O., Mailund, T., Schierup, M.: Genomic relationships and speciation times of human, chimpanzee, and gorilla from a coalescent hidden Markov model. PLoS Genet. 3(2), e7 (2007). doi:10.1371/journal.pgen.0030007
Hudson, R.R.: Gene genealogies and the coalescent process. Oxford Surv. Evol. Biol. 7(1), 44 (1990)
Hudson, R.R.: Generating samples under a wright-fisher neutral model of genetic variation. Bioinformatics 18(2), 337–338 (2002)
Hudson, R.R., Kaplan, N.L.: Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111(1), 147–164 (1985)
Jukes, T., Cantor, C.: Evolution of protein molecules. In: Munro, H. (ed.) Mammalian Protein Metabolism, pp. 21–132. Academic Press, NY (1969)
Kingman, J.F.C.: The coalescent. Stochast. Processes Appl. 13, 235–248 (1982)
Kosakovsky Pond, S.L., Posada, D., Gravenor, M.B., Woelk, C.H., Frost, S.D.: Automated phylogenetic detection of recombination using a genetic algorithm. Mol. Biol. Evol. 23(10), 1891–1901 (2006)
Kuhner, M.K., Felsenstein, J.: A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11, 459–468 (1994)
Liu, K., Dai, J., Truong, K., Song, Y., Kohn, M.H., Nakhleh, L.: An HMM-based comparative genomic framework for detecting introgression in eukaryotes. PLoS Comput. Biol. 10(6), e1003649 (2014)
McVean, G.A., Cardin, N.J.: Approximating the coalescent with recombination. Philos. Trans. R. Soc. London B: Biol. Sci. 360(1459), 1387–1393 (2005)
Minichiello, M.J., Durbin, R.: Mapping trait loci by use of inferred ancestral recombination graphs. Am. J. Hum. Genet. 79, 910–922 (2006)
Nachman, M.W., Crowell, S.L.: Estimate of the mutation rate per nucleotide in humans. Genetics 156(1), 297–304 (2000)
Pease, J.B., Hahn, M.W.: Detection and polarization of introgression in a five-taxon phylogeny. Syst. Biol. 64(4), 651–662 (2015)
Pond, S.L.K., Posada, D., Stawiski, E., Chappey, C., Poon, A.F., Hughes, G., Fearnhill, E., Gravenor, M.B., Brown, A.J.L., Frost, S.D.: An evolutionary model-based algorithm for accurate phylogenetic breakpoint mapping and subtype prediction in HIV-1. PLoS Comput. Biol. 5(11), e1000581 (2009)
Posada, D., Crandall, K., Holmes, E.: Recombination in evolutionary genomics. Annu. Rev. Genet. 36, 75–97 (2002)
Powell, M.J.: The bobyqa algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06. University of Cambridge, Cambridge (2009)
Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 2(2), 257–286 (1989)
Rambaut, A., Grass, N.C.: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci.: CABIOS 13(3), 235–238 (1997)
Rasmussen, M.D., Hubisz, M.J., Gronau, I., Siepel, A.: Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10(5), e1004342 (2014)
Seshan, V.E.: clinfun: Clinical trial design and data analysis functions. R Package Version, 1(6) (2014)
Springer, M.S., Gatesy, J.: The gene tree delusion. Mol. Phylogenet. Evol. 94, 1–33 (2016)
Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)
Takuno, S., Kado, T., Sugino, R., Nakhleh, L., Innan, H.: Population genomics in bacteria: a case study of staphylococcus aureus. Mol. Biol. Evol. 29(2), 797–809 (2012)
Wiuf, C., Hein, J.: Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999)
Wu, Y.: New methods for inference of local tree topologies with recombinant snp sequences in populations. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 8(1), 182–193 (2011)
Yu, Y., Degnan, J.H., Nakhleh, L.: The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet. 8(4), e1002660 (2012)
Zhang, W., Dasmahapatra, K.K., Mallet, J., Moreira, G.R., Kronforst, M.R.: Genome-wide introgression among distantly related heliconius butterfly species. Genome Biol. 17, 25 (2016)
Zhu, J., Yu, Y., Nakhleh, L.: In the light of deep coalescence: Revisiting trees within networks. BMC Genom. 17(14), 271 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Elworth, R.A.L., Nakhleh, L. (2017). Inferring Local Genealogies on Closely Related Genomes. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-67979-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67978-5
Online ISBN: 978-3-319-67979-2
eBook Packages: Computer ScienceComputer Science (R0)