Skip to main content
Log in

Embeddability and rate identifiability of Kimura 2-parameter matrices

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

Deciding whether a substitution matrix is embeddable (i.e. the corresponding Markov process has a continuous-time realization) is an open problem even for \(4\times 4\) matrices. We study the embedding problem and rate identifiability for the K80 model of nucleotide substitution. For these \(4\times 4\) matrices, we fully characterize the set of embeddable K80 Markov matrices and the set of embeddable matrices for which rates are identifiable. In particular, we describe an open subset of embeddable matrices with non-identifiable rates. This set contains matrices with positive eigenvalues and also diagonal largest in column matrices, which might lead to consequences in parameter estimation in phylogenetics. Finally, we compute the relative volumes of embeddable K80 matrices and of embeddable matrices with identifiable rates. This study concludes the embedding problem for the more general model K81 and its submodels, which had been initiated by the last two authors in a separate work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Barry D, Hartigan JA (1987) Statistical analysis of homonoid molecular evolution. Stat Sci 2:191–207

    Article  Google Scholar 

  • Chang JT (1996) Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci 137(1):51–73

    Article  MathSciNet  Google Scholar 

  • Culver WJ (1966) On the existence and uniqueness of the real logarithm of a matrix. Proc Am Math Soc 17:1146–1151

    Article  MathSciNet  Google Scholar 

  • Cuthbert JR (1972) On uniqueness of the logarithm for Markov semi-groups. J Lond Math Soc 2(4):623–630

    Article  MathSciNet  Google Scholar 

  • Cuthbert JR (1973) The logarithm function for finite-state Markov semi-groups. J Lond Math Soc 2(3):524–532

    Article  MathSciNet  Google Scholar 

  • Davies EB (2010) Embeddable Markov matrices. Electron J Probab 15(47):1474–1486

    Article  MathSciNet  Google Scholar 

  • Duchene S, Holt KE, Weill F-X, Le Hello S, Hawkey J, Edwards D, Fourment M, Holmes E (2016) Genome-scale rates of evolutionary change in bacteria. Microbial Genomics 2:e000094

    Article  Google Scholar 

  • Evans SN, Speed TP (1993) Invariants of some probability models used in phylogenetic inference. Ann Stat 21:355–377

    Article  MathSciNet  Google Scholar 

  • Fernández-Sánchez J, Sumner JG, Jarvis PD, Woodhams MD (2015) Lie Markov models with purine/pyrimidine symmetry. J Math Biol 70(4):855–91

    Article  MathSciNet  Google Scholar 

  • Gantmacher FR (1959) The theory of matrices—1. Chelsea Publishing Company, Vermont

    MATH  Google Scholar 

  • Goodman GS (1970) An intrinsic time for non-stationary finite Markov chains. Probab Theor Relat Field 16:165–180

    MathSciNet  MATH  Google Scholar 

  • Guerry M-A (2013) On the embedding problem for discrete-time Markov chains. J Appl Probab 50(4):918–930

    Article  MathSciNet  Google Scholar 

  • Guerry M-A (2019) Sufficient embedding conditions for three-state discrete-time Markov chains with real eigenvalues. Linear Multilinear Algebra 67(1):106–120

    Article  MathSciNet  Google Scholar 

  • Hendy MD, Penny D (1993) Spectral analysis of phylogenetic data. J Classif 10(1):5–24

    Article  Google Scholar 

  • Higham NJ (2008) Functions of matrices—theory and computation. SIAM, Philadelphia

    Book  Google Scholar 

  • Ho SYW, Shapiro B, Phillips MJ, Cooper A, Drummond AJ (2007) Evidence for time dependency of molecular rate estimates. Syst Biol 56(3):515–522

    Article  Google Scholar 

  • Israel RB, Rosenthal JS, Wei JZ (2001) Finding generators for Markov chains via empirical transition matrices, with applications to credit ratings. Math Finance 11(2):245–265

    Article  MathSciNet  Google Scholar 

  • Jia C (2016) A solution to the reversible embedding problem for finite Markov chains. Stat Probab Lett 116:122–130

    Article  MathSciNet  Google Scholar 

  • Jia C, Qian M, Jiang D (2014) Overshoot in biological systems modelled by Markov chains: a non-equilibrium dynamic phenomenon. IET Syst Biol 8(4):138–145

    Article  Google Scholar 

  • Jukes TH, Cantor C (1969) Evolution of protein molecules. Mamm Protein Metab 3(21):132

    Google Scholar 

  • Kaehler BD, Yap VB, Zhang R, Huttley GA (2015) Genetic distance for a general non-stationary Markov substitution process. Syst Biol 64(2):281–293

    Article  Google Scholar 

  • Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16(2):111–120

    Article  Google Scholar 

  • Kimura M (1981) Estimation of evolutionary distances between homologous nucleotide sequences. Proc Natl Acad Sci 78(1):454–458

    Article  Google Scholar 

  • Kosta D, Kubjas K (2017) Geometry of symmetric group-based models. ArXiv e-prints arXiv:1705.09228

  • Roca-Lacostena J, Fernández-Sánchez J (2018) Embeddability of Kimura 3st Markov matrices. J Theor Biol 445:128–135

    Article  MathSciNet  Google Scholar 

  • Singer B, Spilerman S (1976) The representation of social processes by Markov models. Am J Sociol 82(1):1–54

    Article  Google Scholar 

  • Steel M (2016) Phylogeny: discrete and random processes in evolution. In: CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM

  • Van-Brunt A (2018) Infinitely divisible nonnegative matrices, m-matrices, and the embedding problem for finite state stationary Markov chains. Linear Algebra Appl 541:163–176

    Article  MathSciNet  Google Scholar 

  • Verbyla KL, Yap VB, Pahwa A, Shao Y, Huttley GA (2013) The embedding problem for Markov models of nucleotide substitution. PLoS ONE 8:e69187

    Article  Google Scholar 

  • Zou L, Susko E, Field C, Roger AJ (2011) The parameters of the Barry and Hartigan general Markov model are statistically nonidentifiable. Syst Biol 60(6):872–875

    Article  Google Scholar 

Download references

Acknowledgements

All authors are partially funded by AGAUR Project 2017 SGR-932 and MINECO/FEDER Projects MTM2015-69135 and MDM-2014-0445. J Roca-Lacostena has received also funding from Secretaria d’Universitats i Recerca de la Generalitat de Catalunya (AGAUR 2018FI_B_00947) and European Social Funds. The authors would like to express their gratitude to Jeremy Sumner for his remarks and interesting conversations on the topic. They are also grateful to the anonymous reviewers for useful comments on the first version of the manuscript, which greatly improved the paper.

Author information

Authors and Affiliations

Authors

Contributions

MC and JFS conceived the project, revised the proofs and computations and drafted part of the manuscript. JRL wrote the core of the manuscript and worked out the proofs and computations. All authors read, revised and approved the final manuscript.

Corresponding author

Correspondence to Marta Casanellas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Casanellas, M., Fernández-Sánchez, J. & Roca-Lacostena, J. Embeddability and rate identifiability of Kimura 2-parameter matrices. J. Math. Biol. 80, 995–1019 (2020). https://doi.org/10.1007/s00285-019-01446-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-019-01446-0

Keywords

Mathematics Subject Classification

Navigation