Abstract
We introduce the first polynomial-time phylogenetic reconstruction algorithm under a model of sequence evolution allowing insertions and deletions (or indels). Given appropriate assumptions, our algorithm requires sequence lengths growing polynomially in the number of leaf taxa. Our techniques are distance-based and largely bypass the problem of multiple alignment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Thorne, J.L., Kishino, H., Felsenstein, J.: An evolutionary model for maximum likelihood alignment of dna sequences. Journal of Molecular Evolution 33(2), 114–124 (1991)
Thorne, J.L., Kishino, H., Felsenstein, J.: Inching toward reality: An improved likelihood model of sequence evolution. Journal of Molecular Evolution 34(1), 3–16 (1992)
Loytynoja, A., Goldman, N.: Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis. Science 320(5883), 1632–1635 (2008)
Wong, K.M., Suchard, M.A., Huelsenbeck, J.P.: Alignment Uncertainty and Genomic Analysis. Science 319(5862), 473–476 (2008)
Metzler, D.: Statistical alignment based on fragment insertion and deletion models. Bioinformatics 19(4), 490–499 (2003)
Miklos, I., Lunter, G.A., Holmes, I.: A ”Long Indel” Model For Evolutionary Sequence Alignment. Mol. Biol. Evol. 21(3), 529–540 (2004)
Suchard, M.A., Redelings, B.D.: BAli-Phy: simultaneous Bayesian inference of alignment and phylogeny. Bioinformatics 22(16), 2047–2048 (2006)
Rivas, E., Eddy, S.R.: Probabilistic phylogenetic inference with insertions and deletions. PLoS Comput. Biol. 4, e1000172 (2008)
Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees. Science 324(5934), 1561–1564 (2009)
Felsenstein, J.: Cases in which parsimony or compatibility methods will be positively misleading. Syst. Biol., 401–410 (1978)
Erdös, P.L., Steel, M.A., Székely, L.A., Warnow, T.A.: A few logs suffice to build (almost) all trees (part 1). Random Struct. Algor. 14(2), 153–184 (1999)
Semple, C., Steel, M.: Phylogenetics. Mathematics and its Applications series, vol. 22. Oxford University Press, Oxford (2003)
Graur, D., Li, W.-H.: Fundamentals of Molecular Evolution, 2nd edn. Sinauer Associates, Inc., Sunderland (1999)
Felsenstein, J.: Inferring Phylogenies. Sinauer, New York (2004)
Atteson, K.: The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica 25(2-3), 251–278 (1999)
Erdös, P.L., Steel, M.A., Székely, L.A., Warnow, T.A.: A few logs suffice to build (almost) all trees (part 2). Theor. Comput. Sci. 221, 77–118 (1999)
Huson, D.H., Nettles, S.H., Warnow, T.J.: Disk-covering, a fast-converging method for phylogenetic tree reconstruction. J. Comput. Biol. 6(3–4) (1999)
Steel, M.A., Székely, L.A.: Inverting random functions. Ann. Comb. 3(1), 103–113 (1999); Combinatorics and biology (Los Alamos, NM, 1998)
Csurös, M., Kao, M.Y.: Provably fast and accurate recovery of evolutionary trees through harmonic greedy triplets. SIAM Journal on Computing 31(1), 306–322 (2001)
Csurös, M.: Fast recovery of evolutionary trees with thousands of nodes. J. Comput. Biol. 9(2), 277–297 (2002)
Steel, M.A., Székely, L.A.: Inverting random functions. II. Explicit bounds for discrete maximum likelihood estimation, with applications. SIAM J. Discrete Math. 15(4), 562–575 (2002) (electronic)
King, V., Zhang, L., Zhou, Y.: On the complexity of distance-based evolutionary tree reconstruction. In: SODA 2003: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 444–453. Society for Industrial and Applied Mathematics, Philadelphia (2003)
Mossel, E., Roch, S.: Learning nonsingular phylogenies and hidden Markov models. Ann. Appl. Probab. 16(2), 583–614 (2006)
Daskalakis, C., Mossel, E., Roch, S.: Optimal phylogenetic reconstruction. In: STOC 2006: Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pp. 159–168. ACM Press, New York (2006)
Lacey, M.R., Chang, J.T.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: insufficiency of polynomial length sequences. Math. Biosci. 199(2), 188–215 (2006)
Daskalakis, C., Hill, C., Jaffe, A., Mihaescu, R., Mossel, E., Rao, S.: Maximal accurate forests from distance matrices. In: Apostolico, A., Guerra, C., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2006. LNCS (LNBI), vol. 3909, pp. 281–295. Springer, Heidelberg (2006)
Mossel, E.: Distorted metrics on trees and phylogenetic forests. IEEE/ACM Trans. Comput. Bio. Bioinform. 4(1), 108–116 (2007)
Gronau, I., Moran, S., Snir, S.: Fast and reliable reconstruction of phylogenetic trees with very short edges. In: Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 379–388. Society for Industrial and Applied Mathematics, Philadelphia (2008)
Roch, S.: Sequence-length requirement for distance-based phylogeny reconstruction: Breaking the polynomial barrier. In: FOCS, pp. 729–738 (2008)
Daskalakis, C., Mossel, E., Roch, S.: Phylogenies without branch bounds: Contracting the short, pruning the deep. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 451–465. Springer, Heidelberg (2009)
Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1(4), 337–348 (1994)
Elias, I.: Settling the intractability of multiple alignment. Journal of Computational Biology 13(7), 1323–1339 (2006) PMID: 17037961
Higgins, D.G., Sharp, P.M.: Clustal: a package for performing multiple sequence alignment on a microcomputer. Gene 73(1), 237–244 (1988)
Katoh, K., Misawa, K., Kuma, K.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl. Acids Res. 30(14), 3059–3066 (2002)
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl. Acids Res. 32(5), 1792–1797 (2004)
Thatte, B.D.: Invertibility of the TKF model of sequence evolution. Math. Biosci. 200(1), 58–75 (2006)
Andoni, A., Daskalakis, C., Hassidim, A., Roch, S.: Trace reconstruction on a tree (2009) (Preprint)
Hohl, M., Ragan, M.A.: Is Multiple-Sequence Alignment Required for Accurate Inference of Phylogeny? Syst. Biol. 56(2), 206–221 (2007)
Karlin, S., Taylor, H.M.: A second course in stochastic processes, p. 542. Academic Press Inc.[Harcourt Brace Jovanovich Publishers], New York (1981)
Buneman, P.: The recovery of trees from measures of dissimilarity. In: Mathematics in the Archaelogical and Historical Sciences, pp. 187–395. Edinburgh University Press, Edinburgh (1971)
Athreya, K.B., Ney, P.E.: Branching processes. Springer, New York (1972); Die Grundlehren der mathematischen Wissenschaften, Band 196
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Daskalakis, C., Roch, S. (2010). Alignment-Free Phylogenetic Reconstruction. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-12683-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12682-6
Online ISBN: 978-3-642-12683-3
eBook Packages: Computer ScienceComputer Science (R0)