Deep learning for predicting 16S rRNA gene copy number

被引:0
作者
Miao, Jiazheng [1 ,3 ]
Chen, Tianlai [1 ,4 ]
Misir, Mustafa [1 ]
Lin, Yajuan [1 ,2 ]
机构
[1] Duke Kunshan Univ, Div Nat & Appl Sci, Suzhou, Peoples R China
[2] Texas A&M Univ Corpus Christi, Dept Life Sci, Corpus Christi, TX 78412 USA
[3] Harvard Med Sch, Dept Biomed Informat, Boston, MA USA
[4] Duke Univ, Dept Biomed Engn, Durham, NC USA
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
CHARACTERS; REGRESSION; DIVERSITY; PARSIMONY; ABUNDANCE; BACTERIA; DATABASE; ARCHAEA; TOOLS; MODEL;
D O I
10.1038/s41598-024-64658-5
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Culture-independent 16S rRNA gene metabarcoding is a commonly used method for microbiome profiling. To achieve more quantitative cell fraction estimates, it is important to account for the 16S rRNA gene copy number (hereafter 16S GCN) of different community members. Currently, there are several bioinformatic tools available to estimate the 16S GCN values, either based on taxonomy assignment or phylogeny. Here we present a novel approach ANNA16, Artificial Neural Network Approximator for 16S rRNA gene copy number, a deep learning-based method that estimates the 16S GCN values directly from the 16S gene sequence strings. Based on 27,579 16S rRNA gene sequences and gene copy number data from the rrnDB database, we show that ANNA16 outperforms the commonly used 16S GCN prediction algorithms. Interestingly, Shapley Additive exPlanations (SHAP) shows that ANNA16 can identify unexpected informative positions in 16S rRNA gene sequences without any prior phylogenetic knowledge, which suggests potential applications beyond 16S GCN prediction.
引用
收藏
页数:14
相关论文
共 78 条
  • [1] Primer, Pipelines, Parameters: Issues in 16S rRNA Gene Sequencing
    Abellan-Schneyder, Isabel
    Matchado, Monica S.
    Reitmeier, Sandra
    Sommer, Alina
    Sewald, Zeno
    Baumbach, Jan
    List, Markus
    Neuhaus, Klaus
    [J]. MSPHERE, 2021, 6 (01)
  • [2] CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction
    Angly, Florent E.
    Dennis, Paul G.
    Skarshewski, Adam
    Vanwonterghem, Inka
    Hugenholtz, Philip
    Tyson, Gene W.
    [J]. MICROBIOME, 2014, 2
  • [3] DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data
    Arango-Argoty, Gustavo
    Garner, Emily
    Prudent, Amy
    Heath, Lenwood S.
    Vikesland, Peter
    Zhang, Liqing
    [J]. MICROBIOME, 2018, 6
  • [4] Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, 10.48550/arXiv.2005.14165]
  • [5] EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences
    Barbera, Pierre
    Kozlov, Alexey M.
    Czech, Lucas
    Morel, Benoit
    Darriba, Diego
    Flouri, Tomas
    Stamatakis, Alexandros
    [J]. SYSTEMATIC BIOLOGY, 2019, 68 (02) : 365 - 369
  • [6] Insertions and deletions as phylogenetic signal in an alignment-free context
    Birth, Niklas
    Dencker, Thomas
    Morgenstern, Burkhard
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (08)
  • [7] Microbial Communities Can Be Described by Metabolic Structure: A General Framework and Application to a Seasonally Variable, Depth-Stratified Microbial Community from the Coastal West Antarctic Peninsula
    Bowman, Jeff S.
    Ducklow, Hugh W.
    [J]. PLOS ONE, 2015, 10 (08):
  • [8] AN ORDINATION OF THE UPLAND FOREST COMMUNITIES OF SOUTHERN WISCONSIN
    BRAY, JR
    CURTIS, JT
    [J]. ECOLOGICAL MONOGRAPHS, 1957, 27 (04) : 326 - 349
  • [9] Breiman L, 1996, MACH LEARN, V24, P49
  • [10] Buitinck L., 2013, arXiv