Abstract
Reconstructing the phylogeny of large groups of large divergent genomes remains a difficult problem to solve, whatever the methods considered. Methods based on distance matrices are blocked due to the calculation of these matrices that is impossible in practice, when Bayesian inference or maximum likelihood methods presuppose multiple alignment of the genomes, which is itself difficult to achieve if precision is required. In this paper, we propose to calculate new distances for randomly selected couples of species over iterations, and then to map the biological sequences in a space of small dimension based on the partial knowledge of this genome similarity matrix. This mapping is then used to obtain a complete graph from which a minimum spanning tree representing the phylogenetic links between species is extracted. This new online Newton method for the computation of eigenvectors that solves the problem of constructing the Laplacian eigenmap for molecular phylogeny is finally applied on a set of more than two thousand complete chloroplasts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)
Alexe, G., et al.: PCA and clustering reveal alternate mtDNA phylogeny of N and M clades. J. Mol. Evol. 67(5), 465–487 (2008)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Boumal, N., Mishra, B., Absil, P.-A., Sepulchre, R.: Manopt, a matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)
Bruneau, M., et al.: A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian mixture model. Comput. Biol. Med. 93, 66–74 (2018)
Candes, E.J., Plan, Y.: Matrix completion with noise. Proc. IEEE 98(6), 925–936 (2010)
Chretien, S., Guyeux, C., Ho, Z-W.O.: Average performance analysis of the stochastic gradient method for online PCA. arXiv preprint arXiv:1804.01071 (2018)
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-21606-5
Hagberg, A., Swart, P., Chult, D.S.: Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Lab (LANL), Los Alamos, NM, USA (2008)
Hazan, E., et al.: Introduction to online convex optimization. Found. Trends® Optim. 2(3–4), 157–325 (2016)
Huson, D.H., Rupp, R., Scornavacca, C.: Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge University Press, Cambridge (2010)
Li, K.-B.: ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19(12), 1585–1586 (2003)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Notredame, C., Higgins, D.G., Heringa, J.: T-coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)
Shamir, O.: Convergence of stochastic gradient descent for PCA. In: International Conference on Machine Learning, pp. 257–265 (2016)
Smith, S.T.: Optimization techniques on riemannian manifolds. Fields Inst. Commun. 3(3), 113–135 (1994)
Tillich, M., et al.: GeSeq-versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45(W1), W6–W11 (2017)
Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10, 66–71 (2009)
Wyman, S.K., Jansen, R.K., Boore, J.L.: Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20(17), 3252–3255 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A A Python Implementation
A A Python Implementation
The following code gives the Python implementation of the method for the more general case of the Stiefel manifold, a generalisation of the sphere. (The case of the sphere corresponds to taking \(r=1\).)
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Chrétien, S., Guyeux, C. (2019). Efficient Online Laplacian Eigenmap Computation for Dimensionality Reduction in Molecular Phylogeny via Optimisation on the Sphere. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11465. Springer, Cham. https://doi.org/10.1007/978-3-030-17938-0_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-17938-0_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17937-3
Online ISBN: 978-3-030-17938-0
eBook Packages: Computer ScienceComputer Science (R0)