Abstract
It is widely anticipated that the study of variation in the human genome will provide a means of predicting risk of a variety of complex diseases. This paper presents a number of algorithmic and combinatorial problems that arise when studying a very common form of genomic variation, single nucleotide polymorphisms (SNPs). We review recent results and present challenging open problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
G. R. Abecasis, S. S. Cherny, W. O. Cookson, and L. R. Cardon. Merlin — rapid analysis of dense genetic maps using sparse gene flow trees. Nature Genetics, 30(1):97–101, 2002.
H. I. Avi-Itzhak, X. Su, and F. M. De La Vega. Selection of minimum subsets of single nucleotide polymorphisms to capture haplotype blockd iversity. In Proceedings of Pacific Symposium on Biocomputing, volume 8, pages 466–477, 2003.
V. Bafna, D. Gusfield, G. Lancia, and S. Yooseph. Haplotyping as a perfect phylogeny. A direct approach. Journal of Computational Biology, 2003. To appear.
V. Bafna, B. V. Halldórsson, R. S. Schwartz, A. G. Clark, and S. Istrail. Haplotypes and informative SNP selection algorithms: Don’t block out information. In Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB), 2003. To appear.
H. Bodlaender, M. Fellows, and T. Warnow. Two strikes against perfect phylogeny. In Proceedings of the 19th International Colloquium on Automata, Languages, and Programming (ICALP), Lecture Notes in Computer Science, pages 273–283. Springer Verlag, 1992.
K. M. J. De Bontridder, B. V. Halldórsson, M. M. Halldórsson, C. A. J. Hurkens, J. K. Lenstra, R. Ravi, and L. Stougie. Approximation algorithms for the minimum test cover problem. Mathematical Programming-B, 2003. To appear.
K. M. J. De Bontridder, B. J. Lageweg, J. K. Lenstra, J. B. Orlin, and L. Stougie. Branch-and-bound algorithms for the test cover problem. In Proceedings of the Tenth Annual European Symposium on Algorithms (ESA), pages 223–233, 2002.
A. Broder. Generating random spanning trees. In Proceedings of the IEEE 30th Annual Symposium on Foundations of Computer Science, pages 442–447, 1989.
S. Chaiken. A combinatorial proof of the all-minors matrix tree theorem. SIAM Journal on Algebraic and Discrete Methods, 3:319–329, 1982.
E. Y. Chen. Methods and products for analyzing polymers. U.S. Patent 6,355,420.
A. G. Clark. Inference of haplotypes from PCR-amplified samples of diploid populations. Molecular Biology and Evolution, 7(2):111–122, 1990.
D. Clayton. Choosing a set of haplotype tagging SNPs from a larger set of diallelic loci. Nature Genetics, 29(2), 2001. URL: http://www.nature.com/ng/journal/v29/n2/extref/ng1001-233-S10.pdf.
H. Cohn, R. Pemantle, and J. Propp. Generating a random sink-free orientation in quadratic time. Electronic Journal of Combinatorics, 9(1), 2002.
M. J. Daly, J. D. Rioux, S. F. Schaffner, T. J. Hudson, and E. S. Lander. High-resolution haplotype structure in the human genome. Nature Genetics, 29:229–232, 2001.
W. H. E. Day and D. Sankoff. Computational complexity of inferring phylogenies by compatibility. Systematic Zoology, 35(2):224–229, 1986.
E. Eskin, E. Halperin, and R. M. Karp. Efficient reconstruction of haplotype structure via perfect phylogeny. Technical report, Columbia University Department of Computer Science, 2002. URL: http://www.cs.columbia.edu/compbio/hap. Update of UCB technical report with the same title.
L. Excoffier and M. Slatkin. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution, 12(5):921–927, 1995.
L. Frisse, R. Hudson, A. Bartoszewicz, J. Wall, T. Donfalk, and A. Di Rienzo. Gene conversion and different population histories may explain the contrast between polymorphism and linkage disequilibrium levels. American Journal of Human Genetics, 69:831–843, 2001.
S. B. Gabriel, S. F. Schaffner, H. Nguyen, J. M. Moore, J. Roy, B. Blumenstiel, J. Higgins, M. DeFelice, A. Lochner, M. Faggart, S. N. Liu-Cordero, C. Rotimi, A. Adeyemo, R. Cooper, R. Ward, E. S. Lander, M. J. Daly, and D. Altschuler. The structure of haplotype blocks in the human genome. Science, 296(5576):2225–2229, 2002.
R. C. Griffiths and P. Marjoram. Ancestral inference from samples of DNA sequences with recombination. Journal of Computational Biology, 3(4):479–502, 1996.
D. Gusfield. A practical algorithm for optimal inference of haplotypes from diploid populations. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), pages 183–189, 2000.
D. Gusfield. Inference of haplotypes from samples of diploid populations: Complexity and algorithms. Journal of Computational Biology, 8(3):305–324, 2001.
D. Gusfield. Haplotyping as perfect phylogeny: Conceptual framework and efficient solutions (Extended abstract). In Proceedings of the Sixth Annual International Conference on Computational Molecular Biology (RECOMB), pages 166–175, 2002.
D. Gusfield. Haplotyping by pure parsimony. In Proceedings of the 2003 Combinatorial Pattern Matching Conference, 2003. To appear.
B. V. Halldórsson, M. M. Halldórsson, and R. Ravi. On the approximability of the minimum test collection problem. In Proceedings of the Ninth Annual European Symposium on Algorithms (ESA), pages 158–169, 2001.
D. L. Hartl and A. G. Clark. Principles of Population Genetics. Sinauer Associates, 1997.
M. E. Hawley and K. K. Kidd. HAPLO: A program using the EM algorithm to estimate the frequencies of multi-site haplotypes. Journal of Heredity, 86:409–411, 1995.
L. Helmuth. Genome research: Map of the human genome 3.0. Science, 293(5530):583–585, 2001.
E. Hubbell. Finding a maximum likelihood solution to haplotype phases is difficult. Personal communication.
E. Hubbell. Finding a parsimony solution to haplotype phase is NP-hard. Personal communication.
R. R. Hudson. Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology, 23:183–201, 1983.
R. R. Hudson. Gene genealogies and the coalescent process. In D. Futuyma and J. Antonovics, editors, Oxford surveys in evolutionary biology, volume 7, pages 1–44. Oxford University Press, 1990.
R. R. Hudson and N. L. Kaplan. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics, 111:147–164, 1985.
A. J. Jeffreys, L. Kauppi, and R. Neumann. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nature Genetics, 29(2):217–222, 2001.
G. Kirchhoff. Über die auflösung der gleichungen, auf welche man bei der untersuchung der linearen verteilung galvanischer ströme geführt wird. Annalen für der Physik und der Chemie, 72:497–508, 1847.
A. Kong, D. F. Gudbjartsson, J. Sainz, G. M. Jonsdottir, S. A. Gudjonsson, B. Richardsson, S. Sigurdardottir, J. Barnard, B. Hallbeck, G. Masson, A. Shlien, S. T. Palsson, M. L. Frigge, T. E. Thorgeirsson, J. R. Gulcher, and K. Stefansson. A high-resolution recombination map of the human genome. Nature Genetics, 31(3):241–247, 2002.
G. Lancia, V. Bafna, S. Istrail, R. Lippert, and R. Schwartz. SNPs problems, complexity and algorithms. In Proceedings of the Ninth Annual European Symposium on Algorithms (ESA), pages 182–193, 2001.
L. Li, J. H. Kim, and M. S. Waterman. Haplotype reconstruction from SNP alignment. In Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB), 2003. To appear.
S. Lin, D. J. Cutler, M. E. Zwick, and A. Chakravarti. Haplotype inference in random population samples. American Journal of Human Genetics, 71:1129–1137, 2002.
R. Lippert, R. Schwartz, G. Lancia, and S. Istrail. Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Briefings in Bioinformatics, 3(1):23–31, 2002.
J. C. Long, R. C. Williams, and M. Urbanek. An E-M algorithm and testing strategy for multiple-locus haplotypes. American Journal of Human Genetics, 56(2):799–810, 1995.
R. Mitra, V. Butty, J. Shendure, B. R. Williams, D. E. Housman, and G. M. Church. Digital genotyping and haplotyping with polymerase colonies. Proceedings of the National Academy of Sciences. To appear.
R. Mitra and G. M. Church. In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Research, 27(e34):1–6, 1999.
T. Niu, Z. S. Qin, X. Xu, and J. S. Liu. Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. American Journal of Human Genetics, 70:157–169, 2002.
M. Nordborg. Handbook of Statistical Genetics, chapter Coalescent Theory. John Wiley & Sons, Ltd, 2001.
N. Patil, A. J. Berno, D. A. Hinds, W. A. Barrett, J. M. Doshi, C. R. Hacker, C. R. Kautzer, D. H. Lee, C. Marjoribanks, D. P. McDonough, B. T. N. Nguyen, M. C. Norris, J. B. Sheehan, N. Shen, D. Stern, R. P. Stokowski, D. J. Thomas, M. O. Trulson, K. R. Vyas, K. A. Frazer, S. P. A. Fodor, and D. R. Cox. Blocks of limited haplotype diversity revealed by high resolution scanning of human chromosome 21. Science, 294:1719–1723, 2001.
D. E. Reich and E. S. Lander. On the allelic spectrum of human disease. Trends in Genetics, 17(9):502–510, 2001.
R. Rizzi, V. Bafna, S. Istrail, and G. Lancia. Practical algorithms and fixed-parameter tractability for the single individual SNP haplotyping problem. In Proceedings of the Second International Workshop on Algorithms in Bioinformatics (WABI), pages 29–43, 2002.
R. S. Schwartz, A. G. Clark, and S. Istrail. Methods for inferring block-wise ancestral history from haploid sequences. In Proceedings of the Second International Workshop on Algorithms in Bioinformatics (WABI), pages 44–59, 2002.
R. S. Schwartz, B. V. Halldórsson, V. Bafna, A. G. Clark, and S. Istrail. Robustness of inference of haplotype blocks tructure. Journal of Computational Biology, 10(1):13–20, 2003.
M. A. Steel. The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification, 9:91–116, 1992.
J. C. Stephens, J. A. Schneider, D. A. Tanguay, J. Choi, T. Acharya, S. E. Stanley, R. Jiang, C. J. Messer, A. Chew, J.-H. Han, J. Duan, J. L. Carr, M. S. Lee, B. Koshy, A. M. Kumar, G. Zhang, W. R. Newell, A. Windemuth, C. Xu, T. S. Kalbfleisch, S. L. Shaner, K. Arnold, V. Schulz, C. M. Drysdale, K. Nandabalan, R. S. Judson, G. Ruano, and G. F. Vovis. Haplotype variation and linkage disequilibrium in 313 human genes. Science, 293(5529):489–493, 2001.
M. Stephens and P. Donnelly. Inference in molecular population genetics. Journal of the Royal Statistical Society, Series B, 62(4):605–635, 2000.
M. Stephens, N. J. Smith, and P. Donnelly. A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics, 68:978–989, 2001.
E. Ukkonen. Finding founder sequences from a set of recombinants. In Proceedings of the Second International Workshop on Algorithms in Bioinformatics (WABI), pages 277–286, 2002.
F. M. De La Vega, X. Su, H. Avi-Itzhak, B. V. Halldórsson, D. Gordon, A. Collins, R. A. Lippert, R. Schwartz, C. Scafe, Y. Wang, M. Laig-Webster, R. T. Koehler, J. Ziegle, L. Wogan, J. F. Stevens, K. M. Leinen, S. J. Olson, K. J. Guegler, X. You, L. Xu., H. G. Hemken, F. Kalush, A. G. Clark, S. Istrail, M. W. Hunkapiller, E. G. Spier, and D. A. Gilbert. The profile of linkage disequilibrium across human chromosomes 6, 21, and 22 in African-American and Caucasian populations. In preparation.
L. Wang, K. Zhang, and L. Zhang. Perfect phylogenetic networks with recombination. Journal of Computational Biology, 8(1):69–78, 2001.
K. M. Weiss and A. G. Clark. Linkage disequilibrium and the mapping of complex human traits. Trends in Genetics, 18(1):19–24, 2002.
K. Zhang, M. Deng, T. Chen, M. S. Waterman, and F. Sun. A dynamic programming algorithm for haplotype block partitioning. Proceedings of the National Academy of Sciences, 99(11):7335–7339, 2002.
P. Zhang, H. Sheng, A. Morabia, and T. C. Gilliam. Optimal step length EM algorithm (OSLEM) for the estimation of haplotype frequency and its application in lipoprotein lipase genotyping. BMC Bioinformatics, 4(3), 2003.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Halldórsson, B.V., Bafna, V., Edwards, N., Lippert, R., Yooseph, S., Istrail, S. (2003). Combinatorial Problems Arising in SNP and Haplotype Analysis. In: Calude, C.S., Dinneen, M.J., Vajnovszki, V. (eds) Discrete Mathematics and Theoretical Computer Science. DMTCS 2003. Lecture Notes in Computer Science, vol 2731. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45066-1_3
Download citation
DOI: https://doi.org/10.1007/3-540-45066-1_3
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40505-4
Online ISBN: 978-3-540-45066-5
eBook Packages: Springer Book Archive