PhyloPythiaS plus : a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes

被引:56
作者
Gregor, Ivan [1 ,2 ,3 ]
Droege, Johannes [1 ,2 ,3 ]
Schirmer, Melanie [4 ]
Quince, Christopher [5 ]
McHardy, Alice C. [1 ,2 ,3 ]
机构
[1] Max Planck Inst Informat, Max Planck Res Grp Computat Genom & Epidemiol, D-66123 Saarbrucken, Germany
[2] Univ Dusseldorf, Dept Algorithm Bioinformat, Dusseldorf, Germany
[3] Helmholtz Ctr Infect Res, Computat Biol Infect Res, Braunschweig, Germany
[4] Broad Inst MIT & Harvard, Cambridge, MA USA
[5] Univ Glasgow, Sch Engn, Glasgow, Lanark, Scotland
来源
PEERJ | 2016年 / 4卷
基金
英国工程与自然科学研究理事会;
关键词
Metagenomics; Taxonomic classification; Machine learning; Bioinformatics; CLASSIFICATION; SEQUENCES; ACCURATE; BACTERIAL;
D O I
10.7717/peerj.1603
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into 'bins' representihg taxa of the underlying microbial community. Assignment. to low-ranking taxonomic bins is an important challenge for binning methods as is scalability Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies 'training' sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have. Results. We have developed PhyloPythiaS+, a successor to our PhyloPythia (S) software. The new (+) component performs the work previously done by the human expert. PhyloPythinS+ also includes a new k-mer counting algorithm, which accelerated the simultaneous counting of 4-6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by,,a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion. PhyloPythiaS+ was compared to MEGAN, taxator-tk, Krakeh and the generic PhyloPythiaS model. The results showed that PhyloPythiaS+ performs especially well for samples originating from novel environments in comparison to the other methods.
引用
收藏
页数:21
相关论文
共 47 条
  • [1] KAnalyze: a fast versatile pipelined K-mer toolkit
    Audano, Peter
    Vannberg, Fredrik
    [J]. BIOINFORMATICS, 2014, 30 (14) : 2070 - 2072
  • [2] The microbiome explored: recent insights and future challenges
    Blaser, Martin
    Bork, Peer
    Fraser, Claire
    Knight, Rob
    Wang, Jun
    [J]. NATURE REVIEWS MICROBIOLOGY, 2013, 11 (03) : 213 - 217
  • [3] Ray Meta: scalable de novo metagenome assembly and profiling
    Boisvert, Sebastien
    Raymond, Frederic
    Godzaridis, Elenie
    Laviolette, Francois
    Corbeil, Jacques
    [J]. GENOME BIOLOGY, 2012, 13 (12):
  • [4] PhymmBL expanded: confidence scores, custom databases, parallelization and more
    Brady, Arthur
    Salzberg, Steven
    [J]. NATURE METHODS, 2011, 8 (05) : 367 - 367
  • [5] Chin CS, 2013, NAT METHODS, V10, P563, DOI [10.1038/nmeth.2474, 10.1038/NMETH.2474]
  • [6] KMC 2: fast and resource-frugal k-mer counting
    Deorowicz, Sebastian
    Kokot, Marek
    Grabowski, Szymon
    Debudaj-Grabysz, Agnieszka
    [J]. BIOINFORMATICS, 2015, 31 (10) : 1569 - 1576
  • [7] Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences
    Deschavanne, PJ
    Giron, A
    Vilain, J
    Fagot, G
    Fertil, B
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (10) : 1391 - 1399
  • [8] Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods
    Droege, J.
    Gregor, I.
    McHardy, A. C.
    [J]. BIOINFORMATICS, 2015, 31 (06) : 817 - 824
  • [9] Taxonomic binning of metagenome samples generated by next-generation sequencing technologies
    Droege, Johannes
    McHardy, Alice C.
    [J]. BRIEFINGS IN BIOINFORMATICS, 2012, 13 (06) : 646 - 655
  • [10] Taxonomic classification of metagenomic shotgun sequences with CARMA3
    Gerlach, Wolfgang
    Stoye, Jens
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 (14) : e91