Skip to main content

A Conditional Autoregressive Model for Detecting Natural Selection in Protein-Coding DNA Sequences

  • Conference paper
  • First Online:
Topics in Applied Statistics

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 55))

  • 1569 Accesses

Abstract

Phylogenetics, the study of evolutionary relationships among groups of organisms, has played an important role in modern biological research, such as genomic comparison, detecting orthology and paralogy, estimating divergence times, reconstructing ancient proteins, identifying mutations likely to be associated with disease, determining the identity of new pathogens, and finding the residues that are important to natural selection. Given an alignment of protein-coding DNA sequences, most methods for detecting natural selection rely on estimating the codon-specific nonsynonymous/synonymous rate ratios (d N d S ). Here, we describe an approach to modeling variation in the d N d S by using a conditional autoregressive (CAR) model. The CAR model relaxes the assumption in most contemporary phylogenetic models, i.e., sites in molecular sequences evolve independently. By incorporating the information stored in the Protein Data Bank (PDB) file, the CAR model estimates the d N d S based on the protein three-dimensional structure. We implement the model in a fully Bayesian approach with all parameters of the model considered as random variables and make use of the NVIDIA’s parallel computing architecture (CUDA) to accelerate the calculation. Our result of analyzing an empirical abalone sperm lysine data is in accordance with the previous findings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yang, Z.: A space-time process model for the evolution of DNA sequences. Genetics 139, 993–1005 (1995)

    Google Scholar 

  2. Felsenstein, J. and G. A. Churchill: A Hidden Markov Model approach to variation among sites in rate of evolution. Molecular Biology and Evolution 13, 93–104 (1996)

    Article  Google Scholar 

  3. Koshi, J. M. and R. A. Goldstein: Models of natural mutations including site heterogeneity. Proteins 32, 289–295 (1998)

    Article  Google Scholar 

  4. Liò, P., N. Goldman, J. L. Thorne, and D. T. Jones3: PASSML: combining evolutionary inference and protein secondary structure prediction. Bioinformatics 14, 726–733 (1998)

    Article  Google Scholar 

  5. Liò, P. and N. Goldman: Using protein structural information in evolutionary inference: transmembrane proteins. Molecular Biology and Evolution 16, 1696–1710 (1999)

    Article  Google Scholar 

  6. Robinson, D., D. Jones, H. Kishino, N. Goldman, and J. Thorne: Protein evolution with dependence among codons due to tertiary structure. Molecular Biology and Evolution 20, 1692–1704 (2003)

    Article  Google Scholar 

  7. Rodrigue, N., N. Lartillot, D. Bryant, and H. Philippe: Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347, 207–217 (2005)

    Article  Google Scholar 

  8. Kleinman, C. L., N. Rodrigue, N. Lartillot, and H. Philippe: Statistical potentials for improved structurally constrained evolutionary models. Molecular Biology and Evolution 27, 1546–1560 (2010)

    Article  Google Scholar 

  9. Huelsenbeck, J., S. Jain, S. Frost, and S. Pond: A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. Proceedings of the National Academy of Sciences of the United States of America 103, 6263–6268 (2006)

    Article  Google Scholar 

  10. Besag, J.: Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological) 36, 192–236 (1974)

    Google Scholar 

  11. Banerjee, S., B. P. Carlin, and A. E. Gelfand: Hierarchical modeling and analysis for spatial data. Chapman & Hall/CRC, London (2004)

    MATH  Google Scholar 

  12. Yang, Z., W. Swanson, and V. Vacquier: Maximum-likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Molecular Biology and Evolution 17, 1446–1455 (2000)

    Article  Google Scholar 

  13. Yang, Z. and J. Bielawski: Statistical methods for detecting molecular adaptation. Trends in Ecology and Evolution 15, 496–503 (2000)

    Article  Google Scholar 

  14. Berman, H., K. Henrick, and H. Nakamura: Announcing the worldwide Protein Data Bank. Nature Structural Biology 10, 980–980 (2003)

    Article  Google Scholar 

  15. Kresge, N., V. D. Vacquier, and C. D. Stout: 1.35 and 2.07 A resolution structures of the red abalone sperm lysin monomer and dimer reveal features involved in receptor binding. Acta Crystallographica Section D: Biological Crystallography 56, 34–41 (2000)

    Article  Google Scholar 

  16. Neal, R. M.: Slice sampling. Annals of Statistics 31, 705–741 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  17. Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller: Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21, 1087–1092 (1953)

    Article  Google Scholar 

  18. Hastings, W. K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)

    Article  MATH  Google Scholar 

  19. Spiegelhalter, D., N. Best, B. Carlin, and A. Linde: Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64, 583–639 (2002)

    Google Scholar 

  20. Geisser, S. and W. F. Eddy: A predictive approach to model selection. Journal of the American Statistical Association 74, 153–160 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  21. Chen, M.-H., Q.-M. Shao, and J. G. Ibrahim: Monte Carlo methods in Bayesian computation. Springer-Verlag Inc., Berlin, New York (2000)

    Book  MATH  Google Scholar 

  22. Gelfand, A. E., J. A. Silander, S. Wu, A. Latimer, P. O. Lewis, A. G. Rebelo, and M. Holder: Explaining species distribution patterns through hierarchical modeling. Bayesian Analysis 1, 41–91 (2006)

    Article  MathSciNet  Google Scholar 

  23. Guo, F., D. K. Dey, and K. E. Holsinger: A Bayesian hierarchical model for analysis of single-nucleotide polymorphisms diversity in multilocus, multipopulation samples. Journal of the American Statistical Association 104, 142–154 (2009)

    Article  MathSciNet  Google Scholar 

  24. Suchard, M. A. and A. Rambaut: Many-core algorithms for statistical phylogenetics. Bioinformatics 25, 1370–1376 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Fan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this paper

Cite this paper

Fan, Y., Wu, R., Chen, MH., Kuo, L., Lewis, P.O. (2013). A Conditional Autoregressive Model for Detecting Natural Selection in Protein-Coding DNA Sequences. In: Hu, M., Liu, Y., Lin, J. (eds) Topics in Applied Statistics. Springer Proceedings in Mathematics & Statistics, vol 55. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7846-1_17

Download citation

Publish with us

Policies and ethics