Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure

被引:26
作者
Pronk, Lotte J. U. [1 ]
Medema, Marnix H. [1 ]
机构
[1] Wageningen Univ, Bioinformat Grp, Wageningen, Netherlands
来源
MICROBIAL GENOMICS | 2022年 / 8卷 / 05期
关键词
metagenomics; taxonomy; gene structure; machine learning; biosynthetic gene cluster; BACTERIAL;
D O I
10.1099/mgen.0.000823
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Metagenomics has become a prominent technology to study the functional potential of all organisms in a microbial community. Most studies focus on the bacterial content of these communities, while ignoring eukaryotic microbes. Indeed, many metagenomics analysis pipelines silently assume that all contigs in a metagenome are prokaryotic, likely resulting in less accurate annotation of eukaryotes in metagenomes. Early detection of eukaryotic contigs allows for eukaryote- specific gene prediction and functional annotation. Here, we developed a classifier that distinguishes eukaryotic from prokaryotic contigs based on foundational differences between these taxa in terms of gene structure. We first developed Whokaryote, a random forest classifier that uses intergenic distance, gene density and gene length as the most important features. We show that, with an estimated recall, precision and accuracy of 94, 96 and 95 %, respectively, this classifier with features grounded in biology can perform almost as well as the classifiers EukRep and Tiara, which use k- mer frequencies as features. By retraining our classifier with Tiara predictions as an additional feature, the weaknesses of both types of classifiers are compensated; the result is Whokaryote+Tiara, an enhanced classifier that outperforms all individual classifiers, with an F1 score of 0.99 for both eukaryotes and prokaryotes, while still being fast. In a reanalysis of metagenome data from a disease- suppressive plant endospheric microbial community, we show how using Whokaryote+Tiara to select contigs for eukaryotic gene prediction facilitates the discovery of several biosynthetic gene clusters that were missed in the original study. Whokaryote (+Tiara) is wrapped in an easily installable package and is freely available from https://github.com/LottePronk/whokaryote.
引用
收藏
页数:10
相关论文
共 23 条
  • [1] The Other Microeukaryotes of the Coral Reef Microbiome
    Ainsworth, T. D.
    Fordyce, A. J.
    Camp, E. F.
    [J]. TRENDS IN MICROBIOLOGY, 2017, 25 (12) : 980 - 991
  • [2] antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline
    Blin, Kai
    Shaw, Simon
    Steinke, Katharina
    Villebro, Rasmus
    Ziemert, Nadine
    Lee, Sang Yup
    Medema, Marnix H.
    Weber, Tilmann
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (W1) : W81 - W87
  • [3] Rapid coastal spread of First Americans: Novel insights from South America's Southern Cone mitochondrial genomes
    Bodner, Martin
    Perego, Ugo A.
    Huber, Gabriela
    Fendt, Liane
    Roeck, Alexander W.
    Zimmermann, Bettina
    Olivieri, Anna
    Gomez-Carballa, Alberto
    Lancioni, Hovirag
    Angerhofer, Norman
    Cecilia Bobillo, Maria
    Corach, Daniel
    Woodward, Scott R.
    Salas, Antonio
    Achilli, Alessandro
    Torroni, Antonio
    Bandelt, Hans-Juergen
    Parson, Walther
    [J]. GENOME RESEARCH, 2012, 22 (05) : 811 - 820
  • [4] Pathogen-induced activation of disease-suppressive functions in the endophytic root microbiome
    Carrion, Victor J.
    Perez-Jaramillo, Juan
    Cordovez, Viviane
    Tracanna, Vittorio
    de Hollander, Mattias
    Ruiz-Buck, Daniel
    Mendes, Lucas W.
    van Ijcken, Wilfred F. J.
    Gomez-Exposito, Ruth
    Elsayed, Somayah S.
    Mohanraju, Prarthana
    Arifah, Adini
    van der Oost, John
    Paulson, Joseph N.
    Mendes, Rodrigo
    van Wezel, Gilles P.
    Medema, Marnix H.
    Raaijmakers, Jos M.
    [J]. SCIENCE, 2019, 366 (6465) : 606 - +
  • [5] Gut microbiota in human metabolic health and disease
    Fan, Yong
    Pedersen, Oluf
    [J]. NATURE REVIEWS MICROBIOLOGY, 2021, 19 (01) : 55 - 71
  • [6] Functional metagenomics-guided discovery of potent Cas9 inhibitors in the human microbiome
    Forsberg, Kevin J.
    Bhatt, Ishan V.
    Schmidtke, Danica T.
    Javanmardi, Kamyab
    Dillard, Kaylee E.
    Stoddard, Barry L.
    Finkelstein, Ilya J.
    Kaiser, Brett K.
    Malik, Harmit S.
    [J]. ELIFE, 2019, 8
  • [7] Prodigal: prokaryotic gene recognition and translation initiation site identification
    Hyatt, Doug
    Chen, Gwo-Liang
    LoCascio, Philip F.
    Land, Miriam L.
    Larimer, Frank W.
    Hauser, Loren J.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [8] MetaEuk-sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics
    Karin, Eli Levy
    Mirdita, Milot
    Soeding, Johannes
    [J]. MICROBIOME, 2020, 8 (01)
  • [9] Karlicki M, 2021, BIOINFORMATICS
  • [10] Microbial Eukaryotes: a Missing Link in Gut Microbiome Studies
    Laforest-Lapointe, Isabelle
    Arrieta, Marie-Claire
    [J]. MSYSTEMS, 2018, 3 (02)