Sequence clustering in bioinformatics: an empirical study

被引:196
作者
Zou, Quan [1 ,2 ,3 ,4 ]
Lin, Gang [1 ]
Jiang, Xingpeng [5 ]
Liu, Xiangrong [6 ]
Zeng, Xiangxiang [6 ]
机构
[1] Tianjin Univ, Tianjin, Peoples R China
[2] Univ Elect Sci & Technol China, Chengdu, Sichuan, Peoples R China
[3] IEEE, Piscataway, NJ USA
[4] ACM, New York, NY USA
[5] Cent China Normal Univ, Wuhan, Hubei, Peoples R China
[6] Xiamen Univ, Xiamen, Fujian, Peoples R China
基金
国家重点研发计划;
关键词
operational taxonomic unit; 16S ribosomal RNA; microbiome; sequence clustering; sequence redundancy removal; 16S RIBOSOMAL-RNA; CD-HIT; PROTEIN; ALGORITHMS; CHALLENGES; PREDICTION; ALIGNMENT; ENSEMBLE; SITES; DNA;
D O I
10.1093/bib/bby090
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Sequence clustering is a basic bioinformatics task that is attracting renewed attention with the development of metagenomics and microbiomics. The latest sequencing techniques have decreased costs and as a result, massive amounts of DNA/RNA sequences are being produced. The challenge is to cluster the sequence data using stable, quick and accurate methods. For microbiome sequencing data, 16S ribosomal RNA operational taxonomic units are typically used. However, there is often a gap between algorithm developers and bioinformatics users. Different software tools can produce diverse results and users can find them difficult to analyze. Understanding the different clustering mechanisms is crucial to understanding the results that they produce. In this review, we selected several popular clustering tools, briefly explained the key computing principles, analyzed their characters and compared them using two independent benchmark datasets. Our aim is to assist bioinformatics users in employing suitable clustering tools effectively to analyze big sequencing data.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 60 条
[1]  
Aibar S, 2017, NAT METHODS, V14, P1083, DOI [10.1038/NMETH.4463, 10.1038/nmeth.4463]
[2]   Methods for time series analysis of RNA-seq data with application to human Th17 cell differentiation [J].
Aijo, Tarmo ;
Butty, Vincent ;
Chen, Zhi ;
Salo, Verna ;
Tripathi, Subhash ;
Burge, Christopher B. ;
Lahesmaa, Riitta ;
Lahdesmaki, Harri .
BIOINFORMATICS, 2014, 30 (12) :113-120
[3]   Heritable components of the human fecal microbiome are associated with visceral fat [J].
Beaumont, Michelle ;
Goodrich, Julia K. ;
Jackson, Matthew A. ;
Yet, Idil ;
Davenport, Emily R. ;
Vieira-Silva, Sara ;
Debelius, Justine ;
Pallister, Tess ;
Mangino, Massimo ;
Raes, Jeroen ;
Knight, Rob ;
Clark, Andrew G. ;
Ley, Ruth E. ;
Spector, Tim D. ;
Bell, Jordana T. .
GENOME BIOLOGY, 2016, 17
[4]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[5]   mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking [J].
Bokulich, Nicholas A. ;
Rideout, Jai Ram ;
Mercurio, William G. ;
Shiffer, Arron ;
Wolfe, Benjamin ;
Maurice, Corinne F. ;
Dutton, Rachel J. ;
Turnbaugh, Peter J. ;
Knight, Rob ;
Caporaso, J. Gregory .
MSYSTEMS, 2016, 1 (05)
[6]   Unsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data [J].
Boutros, PC ;
Okey, AB .
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (04) :331-343
[7]   ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time [J].
Cai, Yunpeng ;
Zheng, Wei ;
Yao, Jin ;
Yang, Yujie ;
Mai, Volker ;
Mao, Qi ;
Sun, Yijun .
PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (04)
[8]   The Effect of Anesthetic Technique on Survival in Human Cancers: A Meta-Analysis of Retrospective and Prospective Studies [J].
Chen, Wan-Kun ;
Miao, Chang-Hong .
PLOS ONE, 2013, 8 (02)
[9]   iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties [J].
Chen, Wei ;
Yang, Hui ;
Feng, Pengmian ;
Ding, Hui ;
Lin, Hao .
BIOINFORMATICS, 2017, 33 (22) :3518-3523
[10]   Recent Advances in Conotoxin Classification by Using Machine Learning Methods [J].
Dao, Fu-Ying ;
Yang, Hui ;
Su, Zhen-Dong ;
Yang, Wuritu ;
Wu, Yun ;
Ding, Hui ;
Chen, Wei ;
Tang, Hua ;
Lin, Hao .
MOLECULES, 2017, 22 (07)