CH-Bin: A convex hull based approach for binning metagenomic contigs

被引:3
作者
Chandrasiri, Sunera [1 ]
Perera, Thumula [1 ]
Dilhara, Anjala [1 ]
Perera, Indika [1 ]
Mallawaarachchi, Vijini [2 ,3 ]
机构
[1] Univ Moratuwa, Dept Comp Sci & Engn, Moratuwa 10400, Sri Lanka
[2] Australian Natl Univ, Sch Comp, Canberra, ACT 2600, Australia
[3] Flinders Univ S Australia, Flinders Accelerator Microbiome Explorat, Bedford Pk, SA 5042, Australia
基金
美国国家卫生研究院;
关键词
Convex hull; Convex hull distance; Metagenomic binning; Multiple k values; High dimensional data clustering; Clustering algorithm; CLASSIFICATION; SEQUENCES; GENOMES; ALGORITHM; COVERAGE;
D O I
10.1016/j.compbiolchem.2022.107734
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Metagenomics has enabled culture-independent analysis of micro-organisms present in environmental samples. Metagenomics binning, which involves the grouping of contigs into bins that represent different taxonomic groups, is an important step of a typical metagenomic workflow followed after assembly. The majority of the metagenomic binning tools represent the composition and coverage information of contigs as feature vectors consisting of a large number of dimensions. However, these tools use traditional Euclidean distance or Manhattan distance metrics which become unreliable in the high dimensional space. We propose CH-Bin, a binning approach that leverages the benefits of using convex hull distance for binning contigs represented by high dimensional feature vectors. We demonstrate using experimental evidence on simulated and real datasets that the use of high dimensional feature vectors to represent contigs can preserve additional information, and result in improved binning results. We further demonstrate that the convex hull distance based binning approach can be effectively utilized in binning such high dimensional data. To the best of our knowledge, this is the first time that composition information from oligonucleotides of multiple sizes has been used in representing the composition information of contigs and a convex hull distance based binning algorithm has been used to bin metagenomic contigs. The source code of CH-Bin is available at https://github.com/kdsuneraavinash/CH-Bin.
引用
收藏
页数:9
相关论文
共 56 条
  • [11] Community-wide analysis of microbial genome sequence signatures
    Dick, Gregory J.
    Andersson, Anders F.
    Baker, Brett J.
    Simmons, Sheri L.
    Yelton, A. Pepper
    Banfield, Jillian F.
    [J]. GENOME BIOLOGY, 2009, 10 (08):
  • [12] Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage
    Dupont, Chris L.
    Rusch, Douglas B.
    Yooseph, Shibu
    Lombardo, Mary-Jane
    Richter, R. Alexander
    Valas, Ruben
    Novotny, Mark
    Yee-Greenbaum, Joyclyn
    Selengut, Jeremy D.
    Haft, Dan H.
    Halpern, Aaron L.
    Lasken, Roger S.
    Nealson, Kenneth
    Friedman, Robert
    Venter, J. Craig
    [J]. ISME JOURNAL, 2012, 6 (06) : 1186 - 1199
  • [13] Accelerated Profile HMM Searches
    Eddy, Sean R.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (10)
  • [14] Effects of inorganic and organic treatments on the microbial community of maize rhizosphere by a shotgun metagenomics approach
    Enebe, Matthew Chekwube
    Babalola, Olubukola Oluranti
    [J]. ANNALS OF MICROBIOLOGY, 2020, 70 (01)
  • [15] A NUMERICALLY STABLE DUAL METHOD FOR SOLVING STRICTLY CONVEX QUADRATIC PROGRAMS
    GOLDFARB, D
    IDNANI, A
    [J]. MATHEMATICAL PROGRAMMING, 1983, 27 (01) : 1 - 33
  • [16] Simulating Illumina metagenomic data with InSilicoSeq
    Gourle, Hadrien
    Karlsson-Lindsjo, Oskar
    Hayer, Juliette
    Bongcam-Rudloff, Erik
    [J]. BIOINFORMATICS, 2019, 35 (03) : 521 - 522
  • [17] COMPUTING A NEAREST SYMMETRIC POSITIVE SEMIDEFINITE MATRIX
    HIGHAM, NJ
    [J]. LINEAR ALGEBRA AND ITS APPLICATIONS, 1988, 103 : 103 - 118
  • [18] Isolating "uncultivable" microorganisms in pure culture in a simulated natural environment
    Kaeberlein, T
    Lewis, K
    Epstein, SS
    [J]. SCIENCE, 2002, 296 (5570) : 1127 - 1129
  • [19] MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies
    Kang, Dongwan D.
    Li, Feng
    Kirton, Edward
    Thomas, Ashleigh
    Egan, Rob
    An, Hong
    Wang, Zhong
    [J]. PEERJ, 2019, 7
  • [20] MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities
    Kang, Dongwan D.
    Froula, Jeff
    Egan, Rob
    Wang, Zhong
    [J]. PEERJ, 2015, 3