Skip to main content

Substitution Matrices and Mutual Information Approaches to Modeling Evolution

  • Conference paper
Learning and Intelligent Optimization (LION 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5851))

Included in the following conference series:

Abstract

Substitution matrices are at the heart of Bioinformatics: sequence alignment, database search, phylogenetic inference, protein family classification are all based on BLOSUM, PAM, JTT, mtREV24 and other matrices. These matrices provide means of computing models of evolution and assessing the statistical relationships amongst sequences. This paper reports two results; first we show how Bayesian and grid settings can be used to derive novel specific substitution matrices for fish and insects and we discuss their performances with respect to standard amino acid replacement matrices. Then we discuss a novel application of these matrices: a refinement of the mutual information formula applied to amino acid alignments by incorporating a substitution matrix into the calculation of the mutual information. We show that different substitution matrices provide qualitatively different mutual information results and that the new algorithm allows the derivation of better estimates of the similarity along a sequence alignment. We thus express an interesting procedure: generating ad hoc substitution matrices from a collection of sequences and combining the substitution matrices and mutual information for the detection of sequence patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adachi, J., Hasegawa, M.: Model of amino acid substitution in proteins encoded by mitochondrial DNA. J. Mol. Evol.Ā 42, 459ā€“468 (1996a)

    ArticleĀ  Google ScholarĀ 

  2. Altekar, G., Dwarkadas, S., Huelsenbeck, J.P., Ronquist, F.: Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. BioinformaticsĀ 20, 407ā€“415 (2004)

    ArticleĀ  Google ScholarĀ 

  3. Abascal, F., Posada, D., Zardoya, R.: MtArt: a new model of amino acid replacement for Arthropoda. Mol. Biol. Evol.Ā 24, 1ā€“5 (2007)

    ArticleĀ  Google ScholarĀ 

  4. Huelsenbeck, J.P., Ronquist, F.: MrBayes: Bayesian inference in phylogenetic trees. BioinformaticsĀ 17, 754ā€“755 (2001)

    ArticleĀ  Google ScholarĀ 

  5. Ronquist, F., Huelsenbeck, J.P.: MrBayes3: Bayesian phylogenetic inference under mixed models. BioinformaticsĀ 19, 1572ā€“1574 (2003)

    ArticleĀ  Google ScholarĀ 

  6. Rannala, B., Yang, Z.: Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. GeneticsĀ 164, 1645ā€“1656 (2003)

    Google ScholarĀ 

  7. Goldman, N., Thorne, J.L., Jones, D.T.: Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J. Mol. Biol.Ā 263, 196ā€“208 (1996)

    ArticleĀ  Google ScholarĀ 

  8. Goldman, N., Thorne, J.L., Jones, D.T.: Assessing the impact of secondary structure and solvent accessibility on protein evolution. GeneticsĀ 149, 445ā€“458 (1998)

    Google ScholarĀ 

  9. LiĆ², P., Goldman, N.: Using protein structural information in evolutionary inference: transmembrane proteins. Mol. Biol. Evol.Ā 16, 1696ā€“1710 (1999)

    Google ScholarĀ 

  10. Jones, D.T., Taylor, W.R., Thornton, J.M.: The rapid generation of mutation data matrices from protein sequences. CABIOSĀ 8, 275ā€“282 (1992)

    Google ScholarĀ 

  11. Jones, D.T., Taylor, W.R., Thornton, J.M.: A mutation data matrix for transmembrane proteins. FEBS LettsĀ 339, 269ā€“275 (1994)

    ArticleĀ  Google ScholarĀ 

  12. Altschul, S.F.: Amino acid substitutions matrices from an information theoretic perspective. J. Mol. Biol.Ā 219, 555ā€“665 (1991)

    ArticleĀ  Google ScholarĀ 

  13. Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure, vol.Ā 5(3), pp. 345ā€“352 (1978)

    Google ScholarĀ 

  14. Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USAĀ 89(biochemistry), 10915ā€“10919 (1992)

    ArticleĀ  Google ScholarĀ 

  15. Whelan, S., LiĆ², P., Goldman, N.: Molecular phylogenetics: State-of-art methods for looking into the past. Trends Genet.Ā 17, 262ā€“272 (2001)

    ArticleĀ  Google ScholarĀ 

  16. LiĆ², P., Goldman, N.: Models of molecular evolution and phylogeny. Genome Res.Ā 8, 1233ā€“1244 (1998)

    Google ScholarĀ 

  17. Chomyn, A.: Mitochondrial genetic control of assembly and function of complex I in mammalian cells. J. Bioenerg. Biomembr.Ā 133, 251ā€“257 (2001)

    ArticleĀ  Google ScholarĀ 

  18. Duchen, M.R.: Mitochondria and calcium: from cell signalling to cell death. J. Physiol.Ā 529, 57ā€“68 (2000)

    ArticleĀ  Google ScholarĀ 

  19. Grantham, R.: Amino acid difference formula to help explain protein evolution. ScienceĀ 185, 862ā€“864 (1974)

    ArticleĀ  Google ScholarĀ 

  20. Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. BioinformaticsĀ 17, 149ā€“154 (2001)

    ArticleĀ  Google ScholarĀ 

  21. Carapelli, A., LiĆ², P., Nardi, F., van der Wath, E., Frati, F.: Phylogenetic analysis of mitochondrial protein coding genes confirms the reciprocal paraphyly of Hexapoda and Crustacea. BMC Evol. Biol.Ā 7(suppl. 2), S8 (2007)

    Google ScholarĀ 

  22. Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.: The similarity metric. E-print, arxiv.org/cs.CC/0111054 (2002)

    Google ScholarĀ 

  23. Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications. Springer, New York (1997)

    MATHĀ  Google ScholarĀ 

  24. Zardoya, R., Meyer, A.: Phylogenetic performance of mitochondrial protein-coding genes in resolving relationships among vertebrates. Molecular Biology and EvolutionĀ 13, 525ā€“536 (1996)

    Google ScholarĀ 

  25. LiĆ², P.: Phylogenetic and structural analysis of mitochondrial complex I proteins. GeneĀ 345, 55ā€“64 (1999)

    ArticleĀ  Google ScholarĀ 

  26. Larget, B., Simon, D.: Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol.Ā 16, 750ā€“759 (1999)

    Google ScholarĀ 

  27. Mau, B., Newton, M.A., Larget, B.: Bayesian phylogenetic inference via Markov chain Monte Carlo methods. BiometricsĀ 55, 1ā€“12 (1999)

    ArticleĀ  MATHĀ  MathSciNetĀ  Google ScholarĀ 

  28. Yang, Z., Rannala, B.: Bayesian phylogenetic inference using DNA sequences: Markov chain Monte Carlo methods. Mol. Biol. Evol.Ā 14, 717ā€“724 (1997)

    Google ScholarĀ 

  29. Yang, Z., Nielsen, R., Hasegawa: Models of amino acid substitutions and applications to mitochondrial protein evolution. Mol. Biol. Evol.Ā 15, 1600ā€“1611 (1998)

    Google ScholarĀ 

  30. Gascuel, O.: Mathematics of Evolution and Phylogeny. Oxford University Press, USA (2007)

    Google ScholarĀ 

  31. Yang, Z.: Computational Molecular Evolution. Oxford Series in Ecology and Evolution. Oxford University Press, USA (2006)

    Google ScholarĀ 

  32. Felsenstein, J.: Inferring Phylogenies, 2nd edn. Sinauer Associates (2003)

    Google ScholarĀ 

  33. Nielsen, R.: Statistical Methods in Molecular Evolution, 1st edn. Statistics for Biology and Health. Springer, Heidelberg (2005)

    Google ScholarĀ 

  34. LiĆ², P., Goldman, N.: Models of molecular evolution and phylogeny. Genome Res.Ā 8, 1233ā€“1244 (1998)

    Google ScholarĀ 

  35. Russo, C.A., Takezaki, N., Nei, M.: Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. Mol. Biol. Evol.Ā 13, 933ā€“942 (1996)

    Google ScholarĀ 

  36. Cao, Y., Janke, A., Waddell, P.J., Westerman, M., Takenaka, O., Murata, S., Okada, N., Paabo, S., Hasegawa, M.: Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J. Mol. Evol.Ā 47, 307ā€“322 (1998)

    ArticleĀ  Google ScholarĀ 

  37. Swofford, D.L., Olsen, G.J., Waddell, P.J., Hillis, D.M.: Phylogenetic inference. In: Hillis, D.M., Moritz, C., Mable, B.K. (eds.) Molecular Systematics, pp. 407ā€“514. Sinauer, Sunderland (1996)

    Google ScholarĀ 

  38. Xia, X., Li, W.H.: What amino acid properties affect protein evolution? J. Mol. Evol.Ā 47, 557ā€“564 (1998)

    ArticleĀ  Google ScholarĀ 

  39. Yang, Z.: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol.Ā 39, 306ā€“314 (1994)

    ArticleĀ  Google ScholarĀ 

  40. LiĆ², P., Politi, A., Buiatti, M., Ruffo, S.: High statistics block entropy measures of DNA sequences. J. Theor. Biol.Ā 180(2), 151ā€“160 (1996)

    ArticleĀ  Google ScholarĀ 

  41. Kraskov, A., Stƶgbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E Stat. Nonlin. Soft. Matter. Phys.Ā 69(6 Pt 2), 066138 (2004)

    Google ScholarĀ 

  42. Hein, J.: TreeAlign. Methods Mol. Biol.Ā 25, 349ā€“364 (1994)

    Google ScholarĀ 

  43. Papetti, C., LiĆ², P., Ruber, L., Patarnello, T., Zardoya, R.: Antarctic Fish Mitochondrial Genomes Lack ND6. Gene J. Mol. Evol.Ā 65, 519ā€“528 (2007)

    ArticleĀ  Google ScholarĀ 

  44. Sokal, R.R., Rohlf, F.J.: Biometry, 3rd edn. Freeman, New York (1995)

    Google ScholarĀ 

  45. Seq-Gen: a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, http://tree.bio.ed.ac.uk/software/seqgen/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kitchovitch, S., Song, Y., van der Wath, R., LiĆ², P. (2009). Substitution Matrices and Mutual Information Approaches to Modeling Evolution. In: StĆ¼tzle, T. (eds) Learning and Intelligent Optimization. LION 2009. Lecture Notes in Computer Science, vol 5851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11169-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11169-3_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11168-6

  • Online ISBN: 978-3-642-11169-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics