Skip to main content

Efficiently Solvable Perfect Phylogeny Problems on Binary and k-State Data with Missing Values

  • Conference paper
Algorithms in Bioinformatics (WABI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6833))

Included in the following conference series:

Abstract

The perfect phylogeny problem is of central importance to both evolutionary biology and population genetics. Missing values are a common occurrence in both sequence and genotype data. In their presence, the problem of finding a perfect phylogeny is NP-hard, even for binary characters [24]. We extend the utility of the perfect phylogeny by introducing new efficient algorithms for broad classes of binary and multi-state data with missing values.

Specifically, we address the rich data hypothesis introduced by Halperin and Karp [11] for the binary perfect phylogeny problem with missing data. We give an efficient algorithm for enumerating phylogenies compatible with characters satisfying the rich data hypothesis. This algorithm is useful for computing the probability of data with missing values under the coalescent model.

In addition, we use the partition intersection (PI) graph and chordal graph theory to generalize the rich data hypothesis to multi-state characters with missing values. For a bounded number of states, k, we provide a fixed parameter tractable algorithm for the k-state perfect phylogeny problem with missing data. Our approach reduces missing data problems to problems on complete data. Finally, we characterize a commonly observed condition, an m-clique in the PI graph, under which a perfect phylogeny can be found efficiently for binary characters with missing values. We evaluate our results with extensive empirical analysis using two biologically motivated generative models of character data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwala, R., Fernandez-Baca, D.: A polynomial-time algorithm for the perfect phylogeny problem when the number of character states is fixed. SIAM Journal of Computing 23(6), 1216–1224 (1994)

    Article  MATH  Google Scholar 

  2. Blair, J.R.S., Peyton, B.: An introduction to chordal graphs and clique trees. In: George, A., Gilbert, J.R., Liu, J.W.H. (eds.) Graph Theory and Sparse Matrix Computation, pp. 1–29. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  3. Buneman, P.: A characterisation of rigid circuit graphs. Discrete Mathematics 9(3), 205–212 (1974)

    Article  MATH  Google Scholar 

  4. Ding, Z., Mailund, T., Song, Y.S.: Efficient whole-genome association mapping using local phylogenies for unphased genotype data. Bioinformatics 24(19), 2215–2221 (2008)

    Article  Google Scholar 

  5. Dirac, G.A.: On rigid circuit graphs. Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg 25, 71–76 (1961)

    Article  MATH  Google Scholar 

  6. Dress, A., Steel, M.: Convex tree realizations of partitions. Applied Math. Letters 5, 3–6 (1993)

    Article  MATH  Google Scholar 

  7. Fulkerson, D.R., Gross, O.A.: Incidence matrices and interval graphs. Pac. J. of Math. 15(3), 835–855 (1965)

    Article  MATH  Google Scholar 

  8. Golumbic, M.C.: Algorithmic graph theory and perfect graphs. North-Holland, Amsterdam (2004)

    MATH  Google Scholar 

  9. Gusfield, D.: The multi-state perfect phylogeny problem with missing and removable data: Solutions via integer-programming and chordal graph theory. In: Research in Computational Molecular Biology, pp. 236–252. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Gusfield, D., Frid, Y., Brown, D.: Integer programming formulations and computations solving phylogenetic and population genetic problems with missing or genotypic data. In: Lin, G. (ed.) COCOON 2007. LNCS, vol. 4598, pp. 51–64. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Halperin, E., Karp, R.M.: Perfect phylogeny and haplotype assignment. In: RECOMB 2004: Proc.s of the 8th ann. Internat’l. Conf. on Comp. Mol. Bio., pp. 10–19. ACM Press, New York (2004)

    Google Scholar 

  12. Hudson, R.R.: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2), 337–338 (2002)

    Article  Google Scholar 

  13. Kannan, S., Warnow, T.: A fast algorithm for the computation and enumeration of perfect phylogenies when the number of character states is fixed. In: Proc. of the 6th Ann. ACM-SIAM Symp. on Disc. Alg., pp. 595–603. Society for Industrial and Applied Mathematics, Philadelphia (1995)

    Google Scholar 

  14. Lekkerkerker, C.G., Boland, J.C.: Representation of a finite graph by a set of intervals on the real line. Fundamenta Mathematicae 51, 45–64 (1962)

    MATH  Google Scholar 

  15. Lewis, J.G., Peyton, B.W., Pothen, A.: A fast algorithm for reordering sparse matrices for parallel factorization. SIAM J. on Sci. and Stat. Comp. 10(6), 1146–1173 (1989)

    Article  MATH  Google Scholar 

  16. Li, N., Stephens, M.: Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165(4), 2213–2233 (2003)

    Google Scholar 

  17. McKee, T.A., McMorris, F.R.: Topics in intersection graph theory. SIAM Monographs on Discrete Mathematics (1999)

    Google Scholar 

  18. McMorris, F.R., Meacham, C.A.: Partition intersection graphs. Ars Combinatoria 16B, 135–138 (1983)

    MATH  Google Scholar 

  19. Meacham, C.A.: A manual method for character compatibility analysis. Taxon 30(3), 591–600 (1981)

    Article  Google Scholar 

  20. Pe’er, I., Pupko, T., Shamir, R., Sharan, R.: Incomplete directed perfect phylogeny. SIAM J. Comput. 33(3), 590–607 (2004)

    Article  MATH  Google Scholar 

  21. Pennington, G., Smith, C.A., Shackney, S., Schwartz, R.: Reconstructing tumor phylogenies from heterogeneous single-cell data. J. Bioinfo. and Comp. Bio. 5(2a), 407–427 (2007)

    Article  Google Scholar 

  22. Satya, R., Mukherjee, A.: The undirected incomplete perfect phylogeny problem. IEEE/ACM Trans. on Comp. Bio. and Bioinfo. 5(4), 618–629 (2008)

    Article  Google Scholar 

  23. Semple, C., Steel, M.: Phylogenetics. Oxford University Press, Oxford (2003)

    MATH  Google Scholar 

  24. Steel, M.: The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification, 91–116 (1992)

    Google Scholar 

  25. Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978–989 (2001)

    Article  Google Scholar 

  26. Sze, S.H., Lu, S., Chen, J.: Integrating sample-driven and pattern-driven approaches in motif finding. Algorithms in Bioinformatics, 438–449 (2004)

    Google Scholar 

  27. Tarjan, R., Yannakakis, M.: Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM Journal on Computing 13, 566–579 (1984)

    Article  MATH  Google Scholar 

  28. Warnow, T.J.: Tree compatibility and inferring evolutionary history. In: Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete algorithms, SODA 1993, pp. 382–391. Society for Industrial and Applied Mathematics, Philadelphia (1993)

    Google Scholar 

  29. Wu, Y.: Exact computation of coalescent likelihood under the infinite sites model. In: Măndoiu, I., Narasimhan, G., Zhang, Y. (eds.) ISBRA 2009. LNCS, vol. 5542, pp. 209–220. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stevens, K., Kirkpatrick, B. (2011). Efficiently Solvable Perfect Phylogeny Problems on Binary and k-State Data with Missing Values. In: Przytycka, T.M., Sagot, MF. (eds) Algorithms in Bioinformatics. WABI 2011. Lecture Notes in Computer Science(), vol 6833. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23038-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23038-7_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23037-0

  • Online ISBN: 978-3-642-23038-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics