Convex hull;
Convex hull distance;
Metagenomic binning;
Multiple k values;
High dimensional data clustering;
Clustering algorithm;
CLASSIFICATION;
SEQUENCES;
GENOMES;
ALGORITHM;
COVERAGE;
D O I:
10.1016/j.compbiolchem.2022.107734
中图分类号:
Q [生物科学];
学科分类号:
07 ;
0710 ;
09 ;
摘要:
Metagenomics has enabled culture-independent analysis of micro-organisms present in environmental samples. Metagenomics binning, which involves the grouping of contigs into bins that represent different taxonomic groups, is an important step of a typical metagenomic workflow followed after assembly. The majority of the metagenomic binning tools represent the composition and coverage information of contigs as feature vectors consisting of a large number of dimensions. However, these tools use traditional Euclidean distance or Manhattan distance metrics which become unreliable in the high dimensional space. We propose CH-Bin, a binning approach that leverages the benefits of using convex hull distance for binning contigs represented by high dimensional feature vectors. We demonstrate using experimental evidence on simulated and real datasets that the use of high dimensional feature vectors to represent contigs can preserve additional information, and result in improved binning results. We further demonstrate that the convex hull distance based binning approach can be effectively utilized in binning such high dimensional data. To the best of our knowledge, this is the first time that composition information from oligonucleotides of multiple sizes has been used in representing the composition information of contigs and a convex hull distance based binning algorithm has been used to bin metagenomic contigs. The source code of CH-Bin is available at https://github.com/kdsuneraavinash/CH-Bin.
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Biol Syst & Engn Div, Berkeley, CA 94720 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Wu, Yu-Wei
;
Simmons, Blake A.
论文数: 0引用数: 0
h-index: 0
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Biol Syst & Engn Div, Berkeley, CA 94720 USA
Sandia Natl Labs, Biol & Engn Sci Ctr, Livermore, CA 94551 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Simmons, Blake A.
;
Singer, Steven W.
论文数: 0引用数: 0
h-index: 0
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Biol Syst & Engn Div, Berkeley, CA 94720 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Phys Biosci Div, Berkeley, CA 94720 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Wu, Yu-Wei
;
Tang, Yung-Hsu
论文数: 0引用数: 0
h-index: 0
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
City Coll San Francisco, San Francisco, CA 94112 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Tang, Yung-Hsu
;
Tringe, Susannah G.
论文数: 0引用数: 0
h-index: 0
机构:
Joint Genome Inst, Walnut Creek, CA 94598 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Genom Div, Berkeley, CA 94720 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Tringe, Susannah G.
;
Simmons, Blake A.
论文数: 0引用数: 0
h-index: 0
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Sandia Natl Labs, Biol & Mat Sci Ctr, Livermore, CA 94551 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Simmons, Blake A.
;
Singer, Steven W.
论文数: 0引用数: 0
h-index: 0
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Earth Sci Div, Berkeley, CA 94720 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
机构:
Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R ChinaSouthwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
Yu, Guoxian
;
Jiang, Yuan
论文数: 0引用数: 0
h-index: 0
机构:
Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R ChinaSouthwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
Jiang, Yuan
;
Wang, Jun
论文数: 0引用数: 0
h-index: 0
机构:
Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R ChinaSouthwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
Wang, Jun
;
Zhang, Hao
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Univ Hong Kong, Sch Life Sci, Shatin, Hong Kong, Peoples R China
Chinese Univ Hong Kong, Partner State Key Lab Agrobiotechnol, Shatin, Hong Kong, Peoples R ChinaSouthwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
Zhang, Hao
;
Luo, Haiwei
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Univ Hong Kong, Sch Life Sci, Shatin, Hong Kong, Peoples R China
Chinese Univ Hong Kong, Partner State Key Lab Agrobiotechnol, Shatin, Hong Kong, Peoples R ChinaSouthwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Biol Syst & Engn Div, Berkeley, CA 94720 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Wu, Yu-Wei
;
Simmons, Blake A.
论文数: 0引用数: 0
h-index: 0
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Biol Syst & Engn Div, Berkeley, CA 94720 USA
Sandia Natl Labs, Biol & Engn Sci Ctr, Livermore, CA 94551 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Simmons, Blake A.
;
Singer, Steven W.
论文数: 0引用数: 0
h-index: 0
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Biol Syst & Engn Div, Berkeley, CA 94720 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Phys Biosci Div, Berkeley, CA 94720 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Wu, Yu-Wei
;
Tang, Yung-Hsu
论文数: 0引用数: 0
h-index: 0
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
City Coll San Francisco, San Francisco, CA 94112 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Tang, Yung-Hsu
;
Tringe, Susannah G.
论文数: 0引用数: 0
h-index: 0
机构:
Joint Genome Inst, Walnut Creek, CA 94598 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Genom Div, Berkeley, CA 94720 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Tringe, Susannah G.
;
Simmons, Blake A.
论文数: 0引用数: 0
h-index: 0
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Sandia Natl Labs, Biol & Mat Sci Ctr, Livermore, CA 94551 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
Simmons, Blake A.
;
Singer, Steven W.
论文数: 0引用数: 0
h-index: 0
机构:
Joint BioEnergy Inst, Emeryville, CA 94608 USA
Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Earth Sci Div, Berkeley, CA 94720 USAJoint BioEnergy Inst, Emeryville, CA 94608 USA
机构:
Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R ChinaSouthwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
Yu, Guoxian
;
Jiang, Yuan
论文数: 0引用数: 0
h-index: 0
机构:
Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R ChinaSouthwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
Jiang, Yuan
;
Wang, Jun
论文数: 0引用数: 0
h-index: 0
机构:
Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R ChinaSouthwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
Wang, Jun
;
Zhang, Hao
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Univ Hong Kong, Sch Life Sci, Shatin, Hong Kong, Peoples R China
Chinese Univ Hong Kong, Partner State Key Lab Agrobiotechnol, Shatin, Hong Kong, Peoples R ChinaSouthwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
Zhang, Hao
;
Luo, Haiwei
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Univ Hong Kong, Sch Life Sci, Shatin, Hong Kong, Peoples R China
Chinese Univ Hong Kong, Partner State Key Lab Agrobiotechnol, Shatin, Hong Kong, Peoples R ChinaSouthwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China