Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data

被引:4
|
作者
Mahbub, Sazan [1 ,2 ]
Sawmya, Shashata [1 ]
Saha, Arpita [1 ]
Reaz, Rezwana [1 ]
Rahman, M. Sohel [1 ]
Bayzid, Md. Shamsuzzoha [1 ,3 ]
机构
[1] Bangladesh Univ Engn & Technol, Dept Comp Sci & Engn, Dhaka, Bangladesh
[2] Univ Maryland, Dept Comp Sci, College Pk, MD USA
[3] Bangladesh Univ Engn & Technol, Dept Comp Sci & Engn, ECE Bldg, Dhaka 1205, Bangladesh
关键词
gene tree; gene tree discordance; incomplete lineage sorting; quartet consistency; quartet distribution; species tree; missing data; gene tree imputation; SPECIES TREES; MAXIMUM-LIKELIHOOD; COALESCENT; INFERENCE; CONCATENATION; PROBABILITY; CONCORDANCE; ROOT;
D O I
10.1089/cmb.2022.0212
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, for a combination of reasons (ranging from sampling biases to more biological causes, as in gene birth and loss), gene trees are often incomplete, meaning that not all species of interest have a common set of genes. Incomplete gene trees can potentially impact the accuracy of phylogenomic inference. We, for the first time, introduce the problem of imputing the quartet distribution induced by a set of incomplete gene trees, which involves adding the missing quartets back to the quartet distribution. We present Quartet based Gene tree Imputation using Deep Learning (QT-GILD), an automated and specially tailored unsupervised deep learning technique, accompanied by cues from natural language processing, which learns the quartet distribution in a given set of incomplete gene trees and generates a complete set of quartets accordingly. QT-GILD is a general-purpose technique needing no explicit modeling of the subject system or reasons for missing data or gene tree heterogeneity. Experimental studies on a collection of simulated and empirical datasets suggest that QT-GILD can effectively impute the quartet distribution, which results in a dramatic improvement in the species tree accuracy. Remarkably, QT-GILD not only imputes the missing quartets but can also account for gene tree estimation error. Therefore, QT-GILD advances the state-of-the-art in species tree estimation from gene trees in the face of missing data.
引用
收藏
页码:1156 / 1172
页数:17
相关论文
共 50 条
  • [41] Sensitivity Analysis of Missing Data: Case Studies Using Model-Based Multiple Imputation
    Zhang, Jie
    DRUG INFORMATION JOURNAL, 2009, 43 (04): : 475 - 484
  • [42] Sensitivity Analysis of Missing Data: Case Studies Using Model-Based Multiple Imputation
    Jie Zhang
    Drug information journal : DIJ / Drug Information Association, 2009, 43 (4): : 475 - 484
  • [43] Deep Learning Fault Diagnosis Based on Model Updation in Case of Missing data
    Yang, Shuai
    Zhou, Funa
    Liu, Weibo
    Zhang, Zhiqiang
    Chen, Danmin
    2019 34RD YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2019, : 175 - 180
  • [44] DeepMDP: A Novel Deep-Learning-Based Missing Data Prediction Protocol for IoT
    Kok, Ibrahim
    Ozdemir, Suat
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (01) : 232 - 243
  • [45] Potential of kernel and tree-based machine-learning models for estimating missing data of rainfall
    Sattari, Mohammad Taghi
    Falsafian, Kambiz
    Irvem, Ahmet
    Shahab, S.
    Qasem, Sultan Noman
    ENGINEERING APPLICATIONS OF COMPUTATIONAL FLUID MECHANICS, 2020, 14 (01) : 1078 - 1094
  • [46] A Deep-Learning-Based Forecasting Ensemble to Predict Missing Data for Remote Sensing Analysis
    Das, Monidipa
    Ghosh, Soumya K.
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2017, 10 (12) : 5228 - 5236
  • [47] A Robust Deep Learning-Based Damage Identification Approach for SHM Considering Missing Data
    Deng, Fan
    Tao, Xiaoming
    Wei, Pengxiang
    Wei, Shiyin
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [48] Data Imputation in Wireless Sensor Networks Using a Machine Learning-Based Virtual Sensor
    Matusowsky, Michael
    Ramotsoela, Daniel T.
    Abu-Mahfouz, Adnan M.
    JOURNAL OF SENSOR AND ACTUATOR NETWORKS, 2020, 9 (02)
  • [49] DeepTAL: Deep Learning for TDOA-Based Asynchronous Localization Security With Measurement Error and Missing Data
    Xue, Yuan
    Su, Wei
    Wang, Hongchao
    Yang, Dong
    Jiang, Yemeng
    IEEE ACCESS, 2019, 7 : 122492 - 122502
  • [50] Traffic Flow Prediction Based on Hybrid Deep Learning Models Considering Missing Data and Multiple Factors
    Zeng, Wenbao
    Wang, Ketong
    Zhou, Jianghua
    Cheng, Rongjun
    SUSTAINABILITY, 2023, 15 (14)