CoMeta: Classification of Metagenomes Using k-mers

被引:19
作者
Kawulok, Jolanta [1 ]
Deorowicz, Sebastian [1 ]
机构
[1] Silesian Tech Univ, Inst Informat, Gliwice, Poland
关键词
COMMUNITIES; BACTERIAL; IDENTIFICATION; DISCOVERY; PRODUCTS; RESOURCE; ESTERASE; GENE;
D O I
10.1371/journal.pone.0121453
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Nowadays, the study of environmental samples has been developing rapidly. Characterization of the environment composition broadens the knowledge about the relationship between species composition and environmental conditions. An important element of extracting the knowledge of the sample composition is to compare the extracted fragments of DNA with sequences derived from known organisms. In the presented paper, we introduce an algorithm called CoMeta (Classification of metagenomes), which assigns a query read (a DNA fragment) into one of the groups previously prepared by the user. Typically, this is one of the taxonomic rank (e.g., phylum, genus), however prepared groups may contain sequences having various functions. In CoMeta, we used the exact method for read classification using short subsequences (k-mers) and fast program for indexing large set of k-mers. In contrast to the most popular methods based on BLAST, where the query is compared with each reference sequence, we begin the classification from the top of the taxonomy tree to reduce the number of comparisons. The presented experimental study confirms that CoMeta outperforms other programs used in this context. CoMeta is available at https://github.com/jkawulok/cometa under a free GNU GPL 2 license.
引用
收藏
页数:23
相关论文
共 57 条
[1]   Pyrosequence Analysis of Unamplified and Whole Genome Amplified DNA from Hydrocarbon-Contaminated Groundwater [J].
Abbai, Nathlee S. ;
Govender, Algasan ;
Shaik, Rehana ;
Pillay, Balakrishna .
MOLECULAR BIOTECHNOLOGY, 2012, 50 (01) :39-48
[2]   Scalable metagenomic taxonomy classification using a reference genome database [J].
Ames, Sasha K. ;
Hysom, David A. ;
Gardner, Shea N. ;
Lloyd, G. Scott ;
Gokhale, Maya B. ;
Allen, Jonathan E. .
BIOINFORMATICS, 2013, 29 (18) :2253-2260
[3]  
[Anonymous], EMERGING PARADIGMS M
[4]  
[Anonymous], NATL GEOGR
[5]  
[Anonymous], 2012, P NATL ACAD SCI USA, DOI DOI 10.1073/pnas.1215210110
[6]  
[Anonymous], 2007, NEW SCI MET REV SECR
[7]   A comparative evaluation of sequence classification programs [J].
Bazinet, Adam L. ;
Cummings, Michael P. .
BMC BIOINFORMATICS, 2012, 13
[8]  
Benson DA, 2013, NUCLEIC ACIDS RES, V41, pD36, DOI [10.1093/nar/gkn723, 10.1093/nar/gkp1024, 10.1093/nar/gkw1070, 10.1093/nar/gkr1202, 10.1093/nar/gkx1094, 10.1093/nar/gkl986, 10.1093/nar/gkq1079, 10.1093/nar/gks1195, 10.1093/nar/gkg057]
[9]  
Brady A, 2009, NAT METHODS, V6, P673, DOI [10.1038/nmeth.1358, 10.1038/NMETH.1358]
[10]   The human metagenome: our other genome? [J].
Bruels, Thomas ;
Weissenbach, Jean .
HUMAN MOLECULAR GENETICS, 2011, 20 :R142-R148