Abstract
Deciding whether a substitution matrix is embeddable (i.e. the corresponding Markov process has a continuous-time realization) is an open problem even for \(4\times 4\) matrices. We study the embedding problem and rate identifiability for the K80 model of nucleotide substitution. For these \(4\times 4\) matrices, we fully characterize the set of embeddable K80 Markov matrices and the set of embeddable matrices for which rates are identifiable. In particular, we describe an open subset of embeddable matrices with non-identifiable rates. This set contains matrices with positive eigenvalues and also diagonal largest in column matrices, which might lead to consequences in parameter estimation in phylogenetics. Finally, we compute the relative volumes of embeddable K80 matrices and of embeddable matrices with identifiable rates. This study concludes the embedding problem for the more general model K81 and its submodels, which had been initiated by the last two authors in a separate work.
Similar content being viewed by others
References
Barry D, Hartigan JA (1987) Statistical analysis of homonoid molecular evolution. Stat Sci 2:191–207
Chang JT (1996) Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci 137(1):51–73
Culver WJ (1966) On the existence and uniqueness of the real logarithm of a matrix. Proc Am Math Soc 17:1146–1151
Cuthbert JR (1972) On uniqueness of the logarithm for Markov semi-groups. J Lond Math Soc 2(4):623–630
Cuthbert JR (1973) The logarithm function for finite-state Markov semi-groups. J Lond Math Soc 2(3):524–532
Davies EB (2010) Embeddable Markov matrices. Electron J Probab 15(47):1474–1486
Duchene S, Holt KE, Weill F-X, Le Hello S, Hawkey J, Edwards D, Fourment M, Holmes E (2016) Genome-scale rates of evolutionary change in bacteria. Microbial Genomics 2:e000094
Evans SN, Speed TP (1993) Invariants of some probability models used in phylogenetic inference. Ann Stat 21:355–377
Fernández-Sánchez J, Sumner JG, Jarvis PD, Woodhams MD (2015) Lie Markov models with purine/pyrimidine symmetry. J Math Biol 70(4):855–91
Gantmacher FR (1959) The theory of matrices—1. Chelsea Publishing Company, Vermont
Goodman GS (1970) An intrinsic time for non-stationary finite Markov chains. Probab Theor Relat Field 16:165–180
Guerry M-A (2013) On the embedding problem for discrete-time Markov chains. J Appl Probab 50(4):918–930
Guerry M-A (2019) Sufficient embedding conditions for three-state discrete-time Markov chains with real eigenvalues. Linear Multilinear Algebra 67(1):106–120
Hendy MD, Penny D (1993) Spectral analysis of phylogenetic data. J Classif 10(1):5–24
Higham NJ (2008) Functions of matrices—theory and computation. SIAM, Philadelphia
Ho SYW, Shapiro B, Phillips MJ, Cooper A, Drummond AJ (2007) Evidence for time dependency of molecular rate estimates. Syst Biol 56(3):515–522
Israel RB, Rosenthal JS, Wei JZ (2001) Finding generators for Markov chains via empirical transition matrices, with applications to credit ratings. Math Finance 11(2):245–265
Jia C (2016) A solution to the reversible embedding problem for finite Markov chains. Stat Probab Lett 116:122–130
Jia C, Qian M, Jiang D (2014) Overshoot in biological systems modelled by Markov chains: a non-equilibrium dynamic phenomenon. IET Syst Biol 8(4):138–145
Jukes TH, Cantor C (1969) Evolution of protein molecules. Mamm Protein Metab 3(21):132
Kaehler BD, Yap VB, Zhang R, Huttley GA (2015) Genetic distance for a general non-stationary Markov substitution process. Syst Biol 64(2):281–293
Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16(2):111–120
Kimura M (1981) Estimation of evolutionary distances between homologous nucleotide sequences. Proc Natl Acad Sci 78(1):454–458
Kosta D, Kubjas K (2017) Geometry of symmetric group-based models. ArXiv e-prints arXiv:1705.09228
Roca-Lacostena J, Fernández-Sánchez J (2018) Embeddability of Kimura 3st Markov matrices. J Theor Biol 445:128–135
Singer B, Spilerman S (1976) The representation of social processes by Markov models. Am J Sociol 82(1):1–54
Steel M (2016) Phylogeny: discrete and random processes in evolution. In: CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM
Van-Brunt A (2018) Infinitely divisible nonnegative matrices, m-matrices, and the embedding problem for finite state stationary Markov chains. Linear Algebra Appl 541:163–176
Verbyla KL, Yap VB, Pahwa A, Shao Y, Huttley GA (2013) The embedding problem for Markov models of nucleotide substitution. PLoS ONE 8:e69187
Zou L, Susko E, Field C, Roger AJ (2011) The parameters of the Barry and Hartigan general Markov model are statistically nonidentifiable. Syst Biol 60(6):872–875
Acknowledgements
All authors are partially funded by AGAUR Project 2017 SGR-932 and MINECO/FEDER Projects MTM2015-69135 and MDM-2014-0445. J Roca-Lacostena has received also funding from Secretaria d’Universitats i Recerca de la Generalitat de Catalunya (AGAUR 2018FI_B_00947) and European Social Funds. The authors would like to express their gratitude to Jeremy Sumner for his remarks and interesting conversations on the topic. They are also grateful to the anonymous reviewers for useful comments on the first version of the manuscript, which greatly improved the paper.
Author information
Authors and Affiliations
Contributions
MC and JFS conceived the project, revised the proofs and computations and drafted part of the manuscript. JRL wrote the core of the manuscript and worked out the proofs and computations. All authors read, revised and approved the final manuscript.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Casanellas, M., Fernández-Sánchez, J. & Roca-Lacostena, J. Embeddability and rate identifiability of Kimura 2-parameter matrices. J. Math. Biol. 80, 995–1019 (2020). https://doi.org/10.1007/s00285-019-01446-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-019-01446-0
Keywords
- Nucleotide substitution model
- Markov matrix
- Markov generator
- Matrix logarithm
- Embedding problem
- Rate identifiability