Skip to main content

Combinatorial Problems Arising in SNP and Haplotype Analysis

  • Conference paper
  • First Online:
Discrete Mathematics and Theoretical Computer Science (DMTCS 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2731))

Abstract

It is widely anticipated that the study of variation in the human genome will provide a means of predicting risk of a variety of complex diseases. This paper presents a number of algorithmic and combinatorial problems that arise when studying a very common form of genomic variation, single nucleotide polymorphisms (SNPs). We review recent results and present challenging open problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G. R. Abecasis, S. S. Cherny, W. O. Cookson, and L. R. Cardon. Merlin — rapid analysis of dense genetic maps using sparse gene flow trees. Nature Genetics, 30(1):97–101, 2002.

    Article  Google Scholar 

  2. H. I. Avi-Itzhak, X. Su, and F. M. De La Vega. Selection of minimum subsets of single nucleotide polymorphisms to capture haplotype blockd iversity. In Proceedings of Pacific Symposium on Biocomputing, volume 8, pages 466–477, 2003.

    Google Scholar 

  3. V. Bafna, D. Gusfield, G. Lancia, and S. Yooseph. Haplotyping as a perfect phylogeny. A direct approach. Journal of Computational Biology, 2003. To appear.

    Google Scholar 

  4. V. Bafna, B. V. Halldórsson, R. S. Schwartz, A. G. Clark, and S. Istrail. Haplotypes and informative SNP selection algorithms: Don’t block out information. In Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB), 2003. To appear.

    Google Scholar 

  5. H. Bodlaender, M. Fellows, and T. Warnow. Two strikes against perfect phylogeny. In Proceedings of the 19th International Colloquium on Automata, Languages, and Programming (ICALP), Lecture Notes in Computer Science, pages 273–283. Springer Verlag, 1992.

    Google Scholar 

  6. K. M. J. De Bontridder, B. V. Halldórsson, M. M. Halldórsson, C. A. J. Hurkens, J. K. Lenstra, R. Ravi, and L. Stougie. Approximation algorithms for the minimum test cover problem. Mathematical Programming-B, 2003. To appear.

    Google Scholar 

  7. K. M. J. De Bontridder, B. J. Lageweg, J. K. Lenstra, J. B. Orlin, and L. Stougie. Branch-and-bound algorithms for the test cover problem. In Proceedings of the Tenth Annual European Symposium on Algorithms (ESA), pages 223–233, 2002.

    Google Scholar 

  8. A. Broder. Generating random spanning trees. In Proceedings of the IEEE 30th Annual Symposium on Foundations of Computer Science, pages 442–447, 1989.

    Google Scholar 

  9. S. Chaiken. A combinatorial proof of the all-minors matrix tree theorem. SIAM Journal on Algebraic and Discrete Methods, 3:319–329, 1982.

    Article  MATH  MathSciNet  Google Scholar 

  10. E. Y. Chen. Methods and products for analyzing polymers. U.S. Patent 6,355,420.

    Google Scholar 

  11. A. G. Clark. Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology and Evolution, 7(2):111–122, 1990.

    Google Scholar 

  12. D. Clayton. Choosing a set of haplotype tagging SNPs from a larger set of diallelic loci. Nature Genetics, 29(2), 2001. URL: http://www.nature.com/ng/journal/v29/n2/extref/ng1001-233-S10.pdf.

  13. H. Cohn, R. Pemantle, and J. Propp. Generating a random sink-free orientation in quadratic time. Electronic Journal of Combinatorics, 9(1), 2002.

    Google Scholar 

  14. M. J. Daly, J. D. Rioux, S. F. Schaffner, T. J. Hudson, and E. S. Lander. High-resolution haplotype structure in the human genome. Nature Genetics, 29:229–232, 2001.

    Article  Google Scholar 

  15. W. H. E. Day and D. Sankoff. Computational complexity of inferring phylogenies by compatibility. Systematic Zoology, 35(2):224–229, 1986.

    Article  Google Scholar 

  16. E. Eskin, E. Halperin, and R. M. Karp. Efficient reconstruction of haplotype structure via perfect phylogeny. Technical report, Columbia University Department of Computer Science, 2002. URL: http://www.cs.columbia.edu/compbio/hap. Update of UCB technical report with the same title.

  17. L. Excoffier and M. Slatkin. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution, 12(5):921–927, 1995.

    Google Scholar 

  18. L. Frisse, R. Hudson, A. Bartoszewicz, J. Wall, T. Donfalk, and A. Di Rienzo. Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. American Journal of Human Genetics, 69:831–843, 2001.

    Article  Google Scholar 

  19. S. B. Gabriel, S. F. Schaffner, H. Nguyen, J. M. Moore, J. Roy, B. Blumenstiel, J. Higgins, M. DeFelice, A. Lochner, M. Faggart, S. N. Liu-Cordero, C. Rotimi, A. Adeyemo, R. Cooper, R. Ward, E. S. Lander, M. J. Daly, and D. Altschuler. The structure of haplotype blocks in the human genome. Science, 296(5576):2225–2229, 2002.

    Article  Google Scholar 

  20. R. C. Griffiths and P. Marjoram. Ancestral inference from samples of DNA sequences with recombination. Journal of Computational Biology, 3(4):479–502, 1996.

    Article  Google Scholar 

  21. D. Gusfield. A practical algorithm for optimal inference of haplotypes from diploid populations. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pages 183–189, 2000.

    Google Scholar 

  22. D. Gusfield. Inference of haplotypes from samples of diploid populations: Complexity and algorithms. Journal of Computational Biology, 8(3):305–324, 2001.

    Article  Google Scholar 

  23. D. Gusfield. Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions (Extended abstract). In Proceedings of the Sixth Annual International Conference on Computational Molecular Biology (RECOMB), pages 166–175, 2002.

    Google Scholar 

  24. D. Gusfield. Haplotyping by pure parsimony. In Proceedings of the 2003 Combinatorial Pattern Matching Conference, 2003. To appear.

    Google Scholar 

  25. B. V. Halldórsson, M. M. Halldórsson, and R. Ravi. On the approximability of the minimum test collection problem. In Proceedings of the Ninth Annual European Symposium on Algorithms (ESA), pages 158–169, 2001.

    Google Scholar 

  26. D. L. Hartl and A. G. Clark. Principles of Population Genetics. Sinauer Associates, 1997.

    Google Scholar 

  27. M. E. Hawley and K. K. Kidd. HAPLO: A program using the EM algorithm to estimate the frequencies of multi-site haplotypes. Journal of Heredity, 86:409–411, 1995.

    Google Scholar 

  28. L. Helmuth. Genome research: Map of the human genome 3.0. Science, 293(5530):583–585, 2001.

    Article  Google Scholar 

  29. E. Hubbell. Finding a maximum likelihood solution to haplotype phases is difficult. Personal communication.

    Google Scholar 

  30. E. Hubbell. Finding a parsimony solution to haplotype phase is NP-hard. Personal communication.

    Google Scholar 

  31. R. R. Hudson. Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology, 23:183–201, 1983.

    Article  MATH  Google Scholar 

  32. R. R. Hudson. Gene genealogies and the coalescent process. In D. Futuyma and J. Antonovics, editors, Oxford surveys in evolutionary biology, volume 7, pages 1–44. Oxford University Press, 1990.

    Google Scholar 

  33. R. R. Hudson and N. L. Kaplan. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics, 111:147–164, 1985.

    Google Scholar 

  34. A. J. Jeffreys, L. Kauppi, and R. Neumann. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nature Genetics, 29(2):217–222, 2001.

    Article  Google Scholar 

  35. G. Kirchhoff. Über die auflösung der gleichungen, auf welche man bei der untersuchung der linearen verteilung galvanischer ströme geführt wird. Annalen für der Physik und der Chemie, 72:497–508, 1847.

    Google Scholar 

  36. A. Kong, D. F. Gudbjartsson, J. Sainz, G. M. Jonsdottir, S. A. Gudjonsson, B. Richardsson, S. Sigurdardottir, J. Barnard, B. Hallbeck, G. Masson, A. Shlien, S. T. Palsson, M. L. Frigge, T. E. Thorgeirsson, J. R. Gulcher, and K. Stefansson. A high-resolution recombination map of the human genome. Nature Genetics, 31(3):241–247, 2002.

    Google Scholar 

  37. G. Lancia, V. Bafna, S. Istrail, R. Lippert, and R. Schwartz. SNPs problems, complexity and algorithms. In Proceedings of the Ninth Annual European Symposium on Algorithms (ESA), pages 182–193, 2001.

    Google Scholar 

  38. L. Li, J. H. Kim, and M. S. Waterman. Haplotype reconstruction from SNP alignment. In Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB), 2003. To appear.

    Google Scholar 

  39. S. Lin, D. J. Cutler, M. E. Zwick, and A. Chakravarti. Haplotype inference in random population samples. American Journal of Human Genetics, 71:1129–1137, 2002.

    Article  Google Scholar 

  40. R. Lippert, R. Schwartz, G. Lancia, and S. Istrail. Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Briefings in Bioinformatics, 3(1):23–31, 2002.

    Article  Google Scholar 

  41. J. C. Long, R. C. Williams, and M. Urbanek. An E-M algorithm and testing strategy for multiple-locus haplotypes. American Journal of Human Genetics, 56(2):799–810, 1995.

    Google Scholar 

  42. R. Mitra, V. Butty, J. Shendure, B. R. Williams, D. E. Housman, and G. M. Church. Digital genotyping and haplotyping with polymerase colonies. Proceedings of the National Academy of Sciences. To appear.

    Google Scholar 

  43. R. Mitra and G. M. Church. In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Research, 27(e34):1–6, 1999.

    Google Scholar 

  44. T. Niu, Z. S. Qin, X. Xu, and J. S. Liu. Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. American Journal of Human Genetics, 70:157–169, 2002.

    Article  Google Scholar 

  45. M. Nordborg. Handbook of Statistical Genetics, chapter Coalescent Theory. John Wiley & Sons, Ltd, 2001.

    Google Scholar 

  46. N. Patil, A. J. Berno, D. A. Hinds, W. A. Barrett, J. M. Doshi, C. R. Hacker, C. R. Kautzer, D. H. Lee, C. Marjoribanks, D. P. McDonough, B. T. N. Nguyen, M. C. Norris, J. B. Sheehan, N. Shen, D. Stern, R. P. Stokowski, D. J. Thomas, M. O. Trulson, K. R. Vyas, K. A. Frazer, S. P. A. Fodor, and D. R. Cox. Blocks of limited haplotype diversity revealed by high resolution scanning of human chromosome 21. Science, 294:1719–1723, 2001.

    Article  Google Scholar 

  47. D. E. Reich and E. S. Lander. On the allelic spectrum of human disease. Trends in Genetics, 17(9):502–510, 2001.

    Article  Google Scholar 

  48. R. Rizzi, V. Bafna, S. Istrail, and G. Lancia. Practical algorithms and fixed-parameter tractability for the single individual SNP haplotyping problem. In Proceedings of the Second International Workshop on Algorithms in Bioinformatics (WABI), pages 29–43, 2002.

    Google Scholar 

  49. R. S. Schwartz, A. G. Clark, and S. Istrail. Methods for inferring block-wise ancestral history from haploid sequences. In Proceedings of the Second International Workshop on Algorithms in Bioinformatics (WABI), pages 44–59, 2002.

    Google Scholar 

  50. R. S. Schwartz, B. V. Halldórsson, V. Bafna, A. G. Clark, and S. Istrail. Robustness of inference of haplotype blocks tructure. Journal of Computational Biology, 10(1):13–20, 2003.

    Article  Google Scholar 

  51. M. A. Steel. The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification, 9:91–116, 1992.

    Article  MATH  MathSciNet  Google Scholar 

  52. J. C. Stephens, J. A. Schneider, D. A. Tanguay, J. Choi, T. Acharya, S. E. Stanley, R. Jiang, C. J. Messer, A. Chew, J.-H. Han, J. Duan, J. L. Carr, M. S. Lee, B. Koshy, A. M. Kumar, G. Zhang, W. R. Newell, A. Windemuth, C. Xu, T. S. Kalbfleisch, S. L. Shaner, K. Arnold, V. Schulz, C. M. Drysdale, K. Nandabalan, R. S. Judson, G. Ruano, and G. F. Vovis. Haplotype variation and linkage disequilibrium in 313 human genes. Science, 293(5529):489–493, 2001.

    Article  Google Scholar 

  53. M. Stephens and P. Donnelly. Inference in molecular population genetics. Journal of the Royal Statistical Society, Series B, 62(4):605–635, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  54. M. Stephens, N. J. Smith, and P. Donnelly. A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68:978–989, 2001.

    Article  Google Scholar 

  55. E. Ukkonen. Finding founder sequences from a set of recombinants. In Proceedings of the Second International Workshop on Algorithms in Bioinformatics (WABI), pages 277–286, 2002.

    Google Scholar 

  56. F. M. De La Vega, X. Su, H. Avi-Itzhak, B. V. Halldórsson, D. Gordon, A. Collins, R. A. Lippert, R. Schwartz, C. Scafe, Y. Wang, M. Laig-Webster, R. T. Koehler, J. Ziegle, L. Wogan, J. F. Stevens, K. M. Leinen, S. J. Olson, K. J. Guegler, X. You, L. Xu., H. G. Hemken, F. Kalush, A. G. Clark, S. Istrail, M. W. Hunkapiller, E. G. Spier, and D. A. Gilbert. The profile of linkage disequilibrium across human chromosomes 6, 21, and 22 in African-American and Caucasian populations. In preparation.

    Google Scholar 

  57. L. Wang, K. Zhang, and L. Zhang. Perfect phylogenetic networks with recombination. Journal of Computational Biology, 8(1):69–78, 2001.

    Article  Google Scholar 

  58. K. M. Weiss and A. G. Clark. Linkage disequilibrium and the mapping of complex human traits. Trends in Genetics, 18(1):19–24, 2002.

    Article  Google Scholar 

  59. K. Zhang, M. Deng, T. Chen, M. S. Waterman, and F. Sun. A dynamic programming algorithm for haplotype block partitioning. Proceedings of the National Academy of Sciences, 99(11):7335–7339, 2002.

    Article  MATH  Google Scholar 

  60. P. Zhang, H. Sheng, A. Morabia, and T. C. Gilliam. Optimal step length EM algorithm (OSLEM) for the estimation of haplotype frequency and its application in lipoprotein lipase genotyping. BMC Bioinformatics, 4(3), 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Halldórsson, B.V., Bafna, V., Edwards, N., Lippert, R., Yooseph, S., Istrail, S. (2003). Combinatorial Problems Arising in SNP and Haplotype Analysis. In: Calude, C.S., Dinneen, M.J., Vajnovszki, V. (eds) Discrete Mathematics and Theoretical Computer Science. DMTCS 2003. Lecture Notes in Computer Science, vol 2731. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45066-1_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-45066-1_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40505-4

  • Online ISBN: 978-3-540-45066-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics