Efficient Online Laplacian Eigenmap Computation for Dimensionality Reduction in Molecular Phylogeny via Optimisation on the Sphere

被引:0
作者
Chretien, Stephane [1 ]
Guyeux, Christophe [2 ]
机构
[1] Natl Phys Lab, Hampton Rd, Teddington TW11 0LW, Middx, England
[2] Univ Bourgogne Franche Comte, Femto ST Inst, UMR 6174, CNRS, Besancon, France
来源
BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2019, PT I | 2019年 / 11465卷
关键词
Nonlinear dimentionality reduction; Laplacian eigenmap; Online matrix completion; Biomolecular phylogeny; SEQUENCE; ANNOTATION;
D O I
10.1007/978-3-030-17938-0_39
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Reconstructing the phylogeny of large groups of large divergent genomes remains a difficult problem to solve, whatever the methods considered. Methods based on distance matrices are blocked due to the calculation of these matrices that is impossible in practice, when Bayesian inference or maximum likelihood methods presuppose multiple alignment of the genomes, which is itself difficult to achieve if precision is required. In this paper, we propose to calculate new distances for randomly selected couples of species over iterations, and then to map the biological sequences in a space of small dimension based on the partial knowledge of this genome similarity matrix. This mapping is then used to obtain a complete graph from which a minimum spanning tree representing the phylogenetic links between species is extracted. This new online Newton method for the computation of eigenvectors that solves the problem of constructing the Laplacian eigenmap for molecular phylogeny is finally applied on a set of more than two thousand complete chloroplasts.
引用
收藏
页码:441 / 452
页数:12
相关论文
共 20 条
[1]  
Absil PA, 2008, OPTIMIZATION ALGORITHMS ON MATRIX MANIFOLDS, P1
[2]   PCA and Clustering Reveal Alternate mtDNA Phylogeny of N and M Clades [J].
Alexe, G. ;
Vijaya Satya, R. ;
Seiler, M. ;
Platt, D. ;
Bhanot, T. ;
Hui, S. ;
Tanaka, M. ;
Levine, A. J. ;
Bhanot, G. .
JOURNAL OF MOLECULAR EVOLUTION, 2008, 67 (05) :465-487
[3]  
[Anonymous], 2008, P 7 PYTHON SCI C
[4]   Laplacian eigenmaps for dimensionality reduction and data representation [J].
Belkin, M ;
Niyogi, P .
NEURAL COMPUTATION, 2003, 15 (06) :1373-1396
[5]  
Boumal N, 2014, J MACH LEARN RES, V15, P1455
[6]   A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model [J].
Bruneau, Marine ;
Mottet, Thierry ;
Moulin, Serge ;
Kerbiriou, Mael ;
Chouly, Franz ;
Chretien, Stephane ;
Guyeux, Christophe .
COMPUTERS IN BIOLOGY AND MEDICINE, 2018, 93 :66-74
[7]   Matrix Completion With Noise [J].
Candes, Emmanuel J. ;
Plan, Yaniv .
PROCEEDINGS OF THE IEEE, 2010, 98 (06) :925-936
[8]  
Chretien S., 2018, ARXIV180401071
[9]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[10]   Ensuring respect for human rights in employment [J].
Friedman, S .
INDUSTRIAL RELATIONS RESEARCH ASSOCIATION SERIES, PROCEEDINGS, 2001, :1-13