Abstract
Substitution matrices are at the heart of Bioinformatics: sequence alignment, database search, phylogenetic inference, protein family classification are all based on BLOSUM, PAM, JTT, mtREV24 and other matrices. These matrices provide means of computing models of evolution and assessing the statistical relationships amongst sequences. This paper reports two results; first we show how Bayesian and grid settings can be used to derive novel specific substitution matrices for fish and insects and we discuss their performances with respect to standard amino acid replacement matrices. Then we discuss a novel application of these matrices: a refinement of the mutual information formula applied to amino acid alignments by incorporating a substitution matrix into the calculation of the mutual information. We show that different substitution matrices provide qualitatively different mutual information results and that the new algorithm allows the derivation of better estimates of the similarity along a sequence alignment. We thus express an interesting procedure: generating ad hoc substitution matrices from a collection of sequences and combining the substitution matrices and mutual information for the detection of sequence patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adachi, J., Hasegawa, M.: Model of amino acid substitution in proteins encoded by mitochondrial DNA. J. Mol. Evol.Ā 42, 459ā468 (1996a)
Altekar, G., Dwarkadas, S., Huelsenbeck, J.P., Ronquist, F.: Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. BioinformaticsĀ 20, 407ā415 (2004)
Abascal, F., Posada, D., Zardoya, R.: MtArt: a new model of amino acid replacement for Arthropoda. Mol. Biol. Evol.Ā 24, 1ā5 (2007)
Huelsenbeck, J.P., Ronquist, F.: MrBayes: Bayesian inference in phylogenetic trees. BioinformaticsĀ 17, 754ā755 (2001)
Ronquist, F., Huelsenbeck, J.P.: MrBayes3: Bayesian phylogenetic inference under mixed models. BioinformaticsĀ 19, 1572ā1574 (2003)
Rannala, B., Yang, Z.: Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. GeneticsĀ 164, 1645ā1656 (2003)
Goldman, N., Thorne, J.L., Jones, D.T.: Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J. Mol. Biol.Ā 263, 196ā208 (1996)
Goldman, N., Thorne, J.L., Jones, D.T.: Assessing the impact of secondary structure and solvent accessibility on protein evolution. GeneticsĀ 149, 445ā458 (1998)
LiĆ², P., Goldman, N.: Using protein structural information in evolutionary inference: transmembrane proteins. Mol. Biol. Evol.Ā 16, 1696ā1710 (1999)
Jones, D.T., Taylor, W.R., Thornton, J.M.: The rapid generation of mutation data matrices from protein sequences. CABIOSĀ 8, 275ā282 (1992)
Jones, D.T., Taylor, W.R., Thornton, J.M.: A mutation data matrix for transmembrane proteins. FEBS LettsĀ 339, 269ā275 (1994)
Altschul, S.F.: Amino acid substitutions matrices from an information theoretic perspective. J. Mol. Biol.Ā 219, 555ā665 (1991)
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A model of evolutionary change in proteins. In: Dayhoff, M.O. (ed.) Atlas of Protein Sequence and Structure, vol.Ā 5(3), pp. 345ā352 (1978)
Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USAĀ 89(biochemistry), 10915ā10919 (1992)
Whelan, S., LiĆ², P., Goldman, N.: Molecular phylogenetics: State-of-art methods for looking into the past. Trends Genet.Ā 17, 262ā272 (2001)
LiĆ², P., Goldman, N.: Models of molecular evolution and phylogeny. Genome Res.Ā 8, 1233ā1244 (1998)
Chomyn, A.: Mitochondrial genetic control of assembly and function of complex I in mammalian cells. J. Bioenerg. Biomembr.Ā 133, 251ā257 (2001)
Duchen, M.R.: Mitochondria and calcium: from cell signalling to cell death. J. Physiol.Ā 529, 57ā68 (2000)
Grantham, R.: Amino acid difference formula to help explain protein evolution. ScienceĀ 185, 862ā864 (1974)
Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. BioinformaticsĀ 17, 149ā154 (2001)
Carapelli, A., LiĆ², P., Nardi, F., van der Wath, E., Frati, F.: Phylogenetic analysis of mitochondrial protein coding genes confirms the reciprocal paraphyly of Hexapoda and Crustacea. BMC Evol. Biol.Ā 7(suppl. 2), S8 (2007)
Li, M., Chen, X., Li, X., Ma, B., Vitanyi, P.: The similarity metric. E-print, arxiv.org/cs.CC/0111054 (2002)
Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications. Springer, New York (1997)
Zardoya, R., Meyer, A.: Phylogenetic performance of mitochondrial protein-coding genes in resolving relationships among vertebrates. Molecular Biology and EvolutionĀ 13, 525ā536 (1996)
LiĆ², P.: Phylogenetic and structural analysis of mitochondrial complex I proteins. GeneĀ 345, 55ā64 (1999)
Larget, B., Simon, D.: Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol.Ā 16, 750ā759 (1999)
Mau, B., Newton, M.A., Larget, B.: Bayesian phylogenetic inference via Markov chain Monte Carlo methods. BiometricsĀ 55, 1ā12 (1999)
Yang, Z., Rannala, B.: Bayesian phylogenetic inference using DNA sequences: Markov chain Monte Carlo methods. Mol. Biol. Evol.Ā 14, 717ā724 (1997)
Yang, Z., Nielsen, R., Hasegawa: Models of amino acid substitutions and applications to mitochondrial protein evolution. Mol. Biol. Evol.Ā 15, 1600ā1611 (1998)
Gascuel, O.: Mathematics of Evolution and Phylogeny. Oxford University Press, USA (2007)
Yang, Z.: Computational Molecular Evolution. Oxford Series in Ecology and Evolution. Oxford University Press, USA (2006)
Felsenstein, J.: Inferring Phylogenies, 2nd edn. Sinauer Associates (2003)
Nielsen, R.: Statistical Methods in Molecular Evolution, 1st edn. Statistics for Biology and Health. Springer, Heidelberg (2005)
LiĆ², P., Goldman, N.: Models of molecular evolution and phylogeny. Genome Res.Ā 8, 1233ā1244 (1998)
Russo, C.A., Takezaki, N., Nei, M.: Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. Mol. Biol. Evol.Ā 13, 933ā942 (1996)
Cao, Y., Janke, A., Waddell, P.J., Westerman, M., Takenaka, O., Murata, S., Okada, N., Paabo, S., Hasegawa, M.: Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J. Mol. Evol.Ā 47, 307ā322 (1998)
Swofford, D.L., Olsen, G.J., Waddell, P.J., Hillis, D.M.: Phylogenetic inference. In: Hillis, D.M., Moritz, C., Mable, B.K. (eds.) Molecular Systematics, pp. 407ā514. Sinauer, Sunderland (1996)
Xia, X., Li, W.H.: What amino acid properties affect protein evolution? J. Mol. Evol.Ā 47, 557ā564 (1998)
Yang, Z.: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol.Ā 39, 306ā314 (1994)
LiĆ², P., Politi, A., Buiatti, M., Ruffo, S.: High statistics block entropy measures of DNA sequences. J. Theor. Biol.Ā 180(2), 151ā160 (1996)
Kraskov, A., Stƶgbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E Stat. Nonlin. Soft. Matter. Phys.Ā 69(6 Pt 2), 066138 (2004)
Hein, J.: TreeAlign. Methods Mol. Biol.Ā 25, 349ā364 (1994)
Papetti, C., LiĆ², P., Ruber, L., Patarnello, T., Zardoya, R.: Antarctic Fish Mitochondrial Genomes Lack ND6. Gene J. Mol. Evol.Ā 65, 519ā528 (2007)
Sokal, R.R., Rohlf, F.J.: Biometry, 3rd edn. Freeman, New York (1995)
Seq-Gen: a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, http://tree.bio.ed.ac.uk/software/seqgen/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kitchovitch, S., Song, Y., van der Wath, R., LiĆ², P. (2009). Substitution Matrices and Mutual Information Approaches to Modeling Evolution. In: StĆ¼tzle, T. (eds) Learning and Intelligent Optimization. LION 2009. Lecture Notes in Computer Science, vol 5851. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11169-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-11169-3_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11168-6
Online ISBN: 978-3-642-11169-3
eBook Packages: Computer ScienceComputer Science (R0)