cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data

被引:71
作者
Zhou, Fengfeng [1 ,2 ,3 ]
Xu, Ying [1 ,2 ,3 ,4 ]
机构
[1] Univ Georgia, Computat Syst Biol Lab, Dept Biochem & Mol Biol, Athens, GA 30602 USA
[2] Univ Georgia, Inst Bioinformat, Athens, GA 30602 USA
[3] Univ Georgia, BioEnergy Sci Ctr BESC, Athens, GA 30602 USA
[4] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Jilin, Peoples R China
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
CLASSIFICATION;
D O I
10.1093/bioinformatics/btq299
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Huge amount of metagenomic sequence data have been produced as a result of the rapidly increasing efforts worldwide in studying microbial communities as a whole. Most, if not all, sequenced metagenomes are complex mixtures of chromosomal and plasmid sequence fragments from multiple organisms, possibly from different kingdoms. Computational methods for prediction of genomic elements such as genes are significantly different for chromosomes and plasmids, hence raising the need for separation of chromosomal from plasmid sequences in a metagenome. We present a program for classification of a metagenome set into chromosomal and plasmid sequences, based on their distinguishing pentamer frequencies. On a large training set consisting of all the sequenced prokaryotic chromosomes and plasmids, the program achieves similar to 92% in classification accuracy. On a large set of simulated metagenomes with sequence lengths ranging from 300 bp to 100 kbp, the program has classification accuracy from 64.45% to 88.75%. On a large independent test set, the program achieves 88.29% classification accuracy.
引用
收藏
页码:2051 / 2052
页数:2
相关论文
共 11 条
  • [1] Cessie S., 1992, Appl. Stat, V41, P191, DOI DOI 10.2307/2347628
  • [2] Binning sequences using very sparse labels within a metagenome
    Chan, Chon-Kit Kenneth
    Hsu, Arthur L.
    Halgamuge, Saman K.
    Tang, Sen-Lin
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [3] Modal Codon Usage: Assessing the Typical Codon Usage of a Genome
    Davis, James J.
    Olsen, Gary J.
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2010, 27 (04) : 800 - 810
  • [4] TACOA - Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
    Diaz, Naryttza N.
    Krause, Lutz
    Goesmann, Alexander
    Niehaus, Karsten
    Nattkemper, Tim W.
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [5] Data mining in bioinformatics using Weka
    Frank, E
    Hall, M
    Trigg, L
    Holmes, G
    Witten, IH
    [J]. BIOINFORMATICS, 2004, 20 (15) : 2479 - 2481
  • [6] Introducing the bacterial 'chromid': not a chromosome, not a plasmid
    Harrison, Peter W.
    Lower, Ryan P. J.
    Kim, Nayoung K. D.
    Young, J. Peter W.
    [J]. TRENDS IN MICROBIOLOGY, 2010, 18 (04) : 141 - 148
  • [7] Köck J, 1998, J VIROL, V72, P9116
  • [8] A continuous process to extract plasmid DNA based on alkaline lysis
    Li, Xiaolin
    Jin, Huali
    Wu, Zhifang
    Rayner, Simon
    Wang, Bin
    [J]. NATURE PROTOCOLS, 2008, 3 (02) : 176 - 180
  • [9] COMPARISON OF PREDICTED AND OBSERVED SECONDARY STRUCTURE OF T4 PHAGE LYSOZYME
    MATTHEWS, BW
    [J]. BIOCHIMICA ET BIOPHYSICA ACTA, 1975, 405 (02) : 442 - 451
  • [10] What's in the mix: phylogenetic classification of metagenome sequence samples
    McHardy, Alice C.
    Rigoutsos, Isidore
    [J]. CURRENT OPINION IN MICROBIOLOGY, 2007, 10 (05) : 499 - 503