Skip to main content

Inferring Local Genealogies on Closely Related Genomes

  • Conference paper
  • First Online:
Comparative Genomics (RECOMB-CG 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10562))

Included in the following conference series:

Abstract

The relationship between the evolution of a set of genomes and of individual loci therein could be very complex. For example, in eukaryotic species, meiotic recombination combined with effects of random genetic drift result in loci whose genealogies differ from each other as well as from the phylogeny of the species or populations—a phenomenon known as incomplete lineage sorting, or ILS. The most common practice for inferring local genealogies of individual loci is to slide a fixed-width window across an alignment of the genomes, and infer a phylogenetic tree from the sequence alignment of each window. However, at the evolutionary scale where ILS is extensive, it is often the case that the phylogenetic signal within each window is too low to infer an accurate local genealogy. In this paper, we propose a hidden Markov model (HMM) based method for inferring local genealogies conditional on a known species tree. The method borrows ideas from the work on coalescent HMMs, yet approximates the model parameterization to focus on computationally efficient inference of local genealogies, rather than on obtaining detailed model parameters. We also show how the method is extended to cases that involve hybridization in addition to recombination and ILS. We demonstrate the performance of our method on synthetic data and one empirical data set, and compare it to the sliding-window approach that is, arguably, the most commonly used technique for inferring local genealogies.

Part of this research was conducted while RALE was funded by a training fellowship from the National Library of Medicine (Award T15LM007093; PD Lydia E. Kavraki). Furthermore, research was funded in part by NSF grants CCF-1541979 and DMS-1547433.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alfaro, M.E., Holder, M.T.: The posterior and the prior in bayesian phylogenetics. Annu. Rev. Ecol. Evol. Syst. 37, 19–42 (2006)

    Article  Google Scholar 

  2. Boussau, B., Guéguen, L., Gouy, M.: A mixture model and a hidden Markov model to simultaneously detect recombination breakpoints and reconstruct phylogenies. Evol. Bioinf. Online 5, 67 (2009)

    Google Scholar 

  3. Heliconius Genome Consortium, et al.: Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature 487(7405), 94–98 (2012)

    Google Scholar 

  4. de Oliveira Martins, L., Leal, E., Kishino, H., Kishino, H.: Phylogenetic detection of recombination with a Bayesian prior on the distance between trees. PLoS One 3(7), e2651 (2008)

    Article  Google Scholar 

  5. Degnan, J.H., Salter, L.A.: Gene tree distributions under the coalescent process. Evolution 59(1), 24–37 (2005)

    Article  Google Scholar 

  6. Degnan, J., Rosenberg, N.: Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24(6), 332–340 (2009)

    Article  Google Scholar 

  7. Durand, E.Y., Patterson, N., Reich, D., Slatkin, M.: Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28(8), 2239–2252 (2011)

    Article  Google Scholar 

  8. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  9. Edwards, S.V., Xi, Z., Janke, A., Faircloth, B.C., McCormack, J.E., Glenn, T.C., Zhong, B., Wu, S., Lemmon, E.M., Lemmon, A.R., Leache, A.D., Liu, L., David, C.C.: Implementing and testing the multispecies coalescent model: a valuable paradigm for phylogenomics. Mol. Phylogenet. Evol. 94, 447–462 (2016)

    Article  Google Scholar 

  10. Elgvin, T.O., Trier, C.N., Tørresen, O.K., Hagen, I.J., Lien, S., Nederbragt, A.J., Ravinet, M., Jensen, H., Sætre, G.-P.: The genomic mosaicism of hybrid speciation. Sci. Adv. 3(6), e1602996 (2017)

    Article  Google Scholar 

  11. Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17(6), 368–376 (1981)

    Article  Google Scholar 

  12. Fontaine, M.C., Pease, J.B., Steele, A., Waterhouse, R.M., Neafsey, D.E., Sharakhov, I.V., Jiang, X., Hall, A.B., Catteruccia, F., Kakani, E., Mitchell, S.N., Wu, Y.-C., Smith, H.A., Love, R.R., Lawniczak, M.K., Slotman, M.A., Emrich, S.J., Hahn, M.W., Besansky, N.J.: Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science 347(6217), 1258524 (2015)

    Article  Google Scholar 

  13. Hahn, M.W., Nakhleh, L.: Irrational exuberance for resolved species trees. Evolution 70(1), 7–17 (2016)

    Article  Google Scholar 

  14. Hein, J., Schierup, M.H., Wiuf, C.: Gene Genealogies, Variation and Evolution. Oxford University Press, Oxford (2005)

    MATH  Google Scholar 

  15. Heled, J., Drummond, A.J.: Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27(3), 570–580 (2010)

    Article  Google Scholar 

  16. Hobolth, A., Christensen, O., Mailund, T., Schierup, M.: Genomic relationships and speciation times of human, chimpanzee, and gorilla from a coalescent hidden Markov model. PLoS Genet. 3(2), e7 (2007). doi:10.1371/journal.pgen.0030007

    Article  Google Scholar 

  17. Hudson, R.R.: Gene genealogies and the coalescent process. Oxford Surv. Evol. Biol. 7(1), 44 (1990)

    Google Scholar 

  18. Hudson, R.R.: Generating samples under a wright-fisher neutral model of genetic variation. Bioinformatics 18(2), 337–338 (2002)

    Article  Google Scholar 

  19. Hudson, R.R., Kaplan, N.L.: Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111(1), 147–164 (1985)

    Google Scholar 

  20. Jukes, T., Cantor, C.: Evolution of protein molecules. In: Munro, H. (ed.) Mammalian Protein Metabolism, pp. 21–132. Academic Press, NY (1969)

    Google Scholar 

  21. Kingman, J.F.C.: The coalescent. Stochast. Processes Appl. 13, 235–248 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  22. Kosakovsky Pond, S.L., Posada, D., Gravenor, M.B., Woelk, C.H., Frost, S.D.: Automated phylogenetic detection of recombination using a genetic algorithm. Mol. Biol. Evol. 23(10), 1891–1901 (2006)

    Article  Google Scholar 

  23. Kuhner, M.K., Felsenstein, J.: A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11, 459–468 (1994)

    Google Scholar 

  24. Liu, K., Dai, J., Truong, K., Song, Y., Kohn, M.H., Nakhleh, L.: An HMM-based comparative genomic framework for detecting introgression in eukaryotes. PLoS Comput. Biol. 10(6), e1003649 (2014)

    Article  Google Scholar 

  25. McVean, G.A., Cardin, N.J.: Approximating the coalescent with recombination. Philos. Trans. R. Soc. London B: Biol. Sci. 360(1459), 1387–1393 (2005)

    Article  Google Scholar 

  26. Minichiello, M.J., Durbin, R.: Mapping trait loci by use of inferred ancestral recombination graphs. Am. J. Hum. Genet. 79, 910–922 (2006)

    Article  Google Scholar 

  27. Nachman, M.W., Crowell, S.L.: Estimate of the mutation rate per nucleotide in humans. Genetics 156(1), 297–304 (2000)

    Google Scholar 

  28. Pease, J.B., Hahn, M.W.: Detection and polarization of introgression in a five-taxon phylogeny. Syst. Biol. 64(4), 651–662 (2015)

    Article  Google Scholar 

  29. Pond, S.L.K., Posada, D., Stawiski, E., Chappey, C., Poon, A.F., Hughes, G., Fearnhill, E., Gravenor, M.B., Brown, A.J.L., Frost, S.D.: An evolutionary model-based algorithm for accurate phylogenetic breakpoint mapping and subtype prediction in HIV-1. PLoS Comput. Biol. 5(11), e1000581 (2009)

    Article  MathSciNet  Google Scholar 

  30. Posada, D., Crandall, K., Holmes, E.: Recombination in evolutionary genomics. Annu. Rev. Genet. 36, 75–97 (2002)

    Article  Google Scholar 

  31. Powell, M.J.: The bobyqa algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06. University of Cambridge, Cambridge (2009)

    Google Scholar 

  32. Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 2(2), 257–286 (1989)

    Article  Google Scholar 

  33. Rambaut, A., Grass, N.C.: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci.: CABIOS 13(3), 235–238 (1997)

    Google Scholar 

  34. Rasmussen, M.D., Hubisz, M.J., Gronau, I., Siepel, A.: Genome-wide inference of ancestral recombination graphs. PLoS Genet. 10(5), e1004342 (2014)

    Article  Google Scholar 

  35. Seshan, V.E.: clinfun: Clinical trial design and data analysis functions. R Package Version, 1(6) (2014)

    Google Scholar 

  36. Springer, M.S., Gatesy, J.: The gene tree delusion. Mol. Phylogenet. Evol. 94, 1–33 (2016)

    Article  Google Scholar 

  37. Stamatakis, A.: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688–2690 (2006)

    Article  Google Scholar 

  38. Takuno, S., Kado, T., Sugino, R., Nakhleh, L., Innan, H.: Population genomics in bacteria: a case study of staphylococcus aureus. Mol. Biol. Evol. 29(2), 797–809 (2012)

    Article  Google Scholar 

  39. Wiuf, C., Hein, J.: Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999)

    Article  MATH  Google Scholar 

  40. Wu, Y.: New methods for inference of local tree topologies with recombinant snp sequences in populations. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 8(1), 182–193 (2011)

    Article  Google Scholar 

  41. Yu, Y., Degnan, J.H., Nakhleh, L.: The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet. 8(4), e1002660 (2012)

    Article  Google Scholar 

  42. Zhang, W., Dasmahapatra, K.K., Mallet, J., Moreira, G.R., Kronforst, M.R.: Genome-wide introgression among distantly related heliconius butterfly species. Genome Biol. 17, 25 (2016)

    Article  Google Scholar 

  43. Zhu, J., Yu, Y., Nakhleh, L.: In the light of deep coalescence: Revisiting trees within networks. BMC Genom. 17(14), 271 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luay Nakhleh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Elworth, R.A.L., Nakhleh, L. (2017). Inferring Local Genealogies on Closely Related Genomes. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67979-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67978-5

  • Online ISBN: 978-3-319-67979-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics