Deep Neighbor Information Learning From Evolution Trees for Phylogenetic Likelihood Estimates

被引:2
作者
Ling, Cheng [1 ]
Cheng, Wenhao [1 ]
Zhang, Haoyu [2 ]
Zhu, Hanhao [3 ]
Zhang, Hua [4 ]
机构
[1] Beijing Univ Chem Technol, Dept Comp Sci & Technol, Beijing 100029, Peoples R China
[2] Zhejiang Ocean Univ, Sch Informat Engn, Zhoushan 316022, Peoples R China
[3] Zhejiang Ocean Univ, Inst Marine Sci & Technol, Zhoushan 316022, Peoples R China
[4] Zhejiang Ocean Univ, Sch Naval Architecture & Maritime, Zhoushan 316022, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Phylogeny; Licenses; Radio frequency; Vegetation; Topology; Task analysis; Random forests; Likelihood probability; phylogenetic analysis; machine learning; prediction model; likelihood prediction; evolution trees; MAXIMUM-LIKELIHOOD; DIVERGENCE TIMES; SEQUENCE EVOLUTION; GENETICS ANALYSIS; BAYESIAN METHODS; INFERENCE; SOFTWARE;
D O I
10.1109/ACCESS.2020.3043150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Likelihood probability based phylogenetic analysis approaches have contributed to impressive advances in minimizing the variance of estimating the evolutionary parameters. However, their actual applications are greatly limited due to the very time-consuming calculations of Conditional Likelihood Probabilities (CLPs). Accurately and quickly obtaining the likelihoods of massive tree samples can facilitate phylogenetic analysis process. Inspired by recent advance of machine learning techniques that greatly improve the performance of many related prediction tasks, this study proposes a Random Forest (RF) based learning and prediction approach, called NeoPLE. The approach initially learns the deep neighbor information between nodes from the topology representations of evolution trees, integrates likelihood information from these trees, and trains a non-linear prediction model. Instead of having to depend on the recursive calculations of the CLPs of tree nodes, NeoPLE transfers the process to a prediction by the trained model, thus the likelihood estimates become irrelevant with the calculations of CLPs. In terms of performance improvement, speedup factors ranging from 2.1 to 3.5X can be achieved on the analysis of realistic data sets. Moreover, NeoPLE is very suitable to handle the data sets having relatively large number of alignment sites, the factor of up to 27.5X can be achieved on the analysis of simulated data sets. In addition, NeoPLE is robust against a wide range of choices of evolutionary models and is ready to integrate in more phylogenetic inference tools. This study fills in the gaps of phylogenetic analysis using a machine learning approach in feature representation and likelihood prediction of evolution trees, which has not been reported in literatures.
引用
收藏
页码:220692 / 220702
页数:11
相关论文
共 50 条
  • [1] Phylogenetically enhanced statistical tools for RNA structure prediction
    Akmaev, VR
    Kelley, ST
    Stormo, GD
    [J]. BIOINFORMATICS, 2000, 16 (06) : 501 - 512
  • [2] The Effect of RNA Substitution Models on Viroid and RNA Virus Phylogenies
    Angel Patino-Galindo, Juan
    Gonzalez-Candelas, Fernando
    Pybus, Oliver G.
    [J]. GENOME BIOLOGY AND EVOLUTION, 2018, 10 (02): : 657 - 666
  • [3] [Anonymous], 2001, Cluster analysis
  • [4] Ayyadevara V. K., 2018, PROMACHINE LEARNING, P105
  • [5] Quality of the fossil record through time
    Benton, MJ
    Wills, MA
    Hitchin, R
    [J]. NATURE, 2000, 403 (6769) : 534 - 537
  • [6] Machine learning in bioinformatics: A brief survey and recommendations for practitioners
    Bhaskar, Harish
    Hoyle, David C.
    Singh, Sameer
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2006, 36 (10) : 1104 - 1125
  • [7] Bishop Christopher M., 2006, BISHOP PATTERN RECOG, DOI DOI 10.1117/1.2819119
  • [8] Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics
    Boulesteix, Anne-Laure
    Janitza, Silke
    Kruppa, Jochen
    Koenig, Inke R.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (06) : 493 - 507
  • [9] Maximum likelihood and Bayesian methods for estimating the distribution of selective effects among classes of mutations using DNA polymorphism data
    Bustamante, CD
    Nielsen, R
    Hartl, DL
    [J]. THEORETICAL POPULATION BIOLOGY, 2003, 63 (02) : 91 - 103
  • [10] Intelligent Placement Model Based On Decision Tree
    Cai, CongYu
    Lu, Huijuan
    Yan, Ke
    Ye, Minchao
    [J]. 2018 NINTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY IN MEDICINE AND EDUCATION (ITME 2018), 2018, : 837 - 841