Skip to main content

Efficient Online Laplacian Eigenmap Computation for Dimensionality Reduction in Molecular Phylogeny via Optimisation on the Sphere

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2019)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11465))

  • 1047 Accesses

Abstract

Reconstructing the phylogeny of large groups of large divergent genomes remains a difficult problem to solve, whatever the methods considered. Methods based on distance matrices are blocked due to the calculation of these matrices that is impossible in practice, when Bayesian inference or maximum likelihood methods presuppose multiple alignment of the genomes, which is itself difficult to achieve if precision is required. In this paper, we propose to calculate new distances for randomly selected couples of species over iterations, and then to map the biological sequences in a space of small dimension based on the partial knowledge of this genome similarity matrix. This mapping is then used to obtain a complete graph from which a minimum spanning tree representing the phylogenetic links between species is extracted. This new online Newton method for the computation of eigenvectors that solves the problem of constructing the Laplacian eigenmap for molecular phylogeny is finally applied on a set of more than two thousand complete chloroplasts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)

    MATH  Google Scholar 

  2. Alexe, G., et al.: PCA and clustering reveal alternate mtDNA phylogeny of N and M clades. J. Mol. Evol. 67(5), 465–487 (2008)

    Article  Google Scholar 

  3. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  Google Scholar 

  4. Boumal, N., Mishra, B., Absil, P.-A., Sepulchre, R.: Manopt, a matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)

    MATH  Google Scholar 

  5. Bruneau, M., et al.: A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian mixture model. Comput. Biol. Med. 93, 66–74 (2018)

    Article  Google Scholar 

  6. Candes, E.J., Plan, Y.: Matrix completion with noise. Proc. IEEE 98(6), 925–936 (2010)

    Article  Google Scholar 

  7. Chretien, S., Guyeux, C., Ho, Z-W.O.: Average performance analysis of the stochastic gradient method for online PCA. arXiv preprint arXiv:1804.01071 (2018)

  8. Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004)

    Article  Google Scholar 

  9. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001). https://doi.org/10.1007/978-0-387-21606-5

    Book  MATH  Google Scholar 

  10. Hagberg, A., Swart, P., Chult, D.S.: Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Lab (LANL), Los Alamos, NM, USA (2008)

    Google Scholar 

  11. Hazan, E., et al.: Introduction to online convex optimization. Found. Trends® Optim. 2(3–4), 157–325 (2016)

    Article  Google Scholar 

  12. Huson, D.H., Rupp, R., Scornavacca, C.: Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge University Press, Cambridge (2010)

    Book  Google Scholar 

  13. Li, K.-B.: ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19(12), 1585–1586 (2003)

    Article  Google Scholar 

  14. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  15. Notredame, C., Higgins, D.G., Heringa, J.: T-coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)

    Article  Google Scholar 

  16. Shamir, O.: Convergence of stochastic gradient descent for PCA. In: International Conference on Machine Learning, pp. 257–265 (2016)

    Google Scholar 

  17. Smith, S.T.: Optimization techniques on riemannian manifolds. Fields Inst. Commun. 3(3), 113–135 (1994)

    MathSciNet  MATH  Google Scholar 

  18. Tillich, M., et al.: GeSeq-versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 45(W1), W6–W11 (2017)

    Article  MathSciNet  Google Scholar 

  19. Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10, 66–71 (2009)

    Google Scholar 

  20. Wyman, S.K., Jansen, R.K., Boore, J.L.: Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20(17), 3252–3255 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stéphane Chrétien .

Editor information

Editors and Affiliations

A A Python Implementation

A A Python Implementation

The following code gives the Python implementation of the method for the more general case of the Stiefel manifold, a generalisation of the sphere. (The case of the sphere corresponds to taking \(r=1\).)

figure b

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chrétien, S., Guyeux, C. (2019). Efficient Online Laplacian Eigenmap Computation for Dimensionality Reduction in Molecular Phylogeny via Optimisation on the Sphere. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11465. Springer, Cham. https://doi.org/10.1007/978-3-030-17938-0_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-17938-0_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-17937-3

  • Online ISBN: 978-3-030-17938-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics