Clustering of Microbiome Data: Evaluation of Ensemble Design Approaches

被引:4
作者
Loncar-Turukalo, Tatjana [1 ]
Lazic, Ivan [1 ]
Maljkovic, Nina [1 ]
Brdar, Sanja [2 ]
机构
[1] Univ Novi Sad, Fac Tech Sci, Novi Sad, Serbia
[2] Univ Novi Sad, BioSense Inst, Novi Sad, Serbia
来源
PROCEEDINGS OF 18TH INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES (IEEE EUROCON 2019) | 2019年
关键词
spectral clustering; kernel k-means; ensemble clustering; microbiome; kernel PCA; DIVERSITY;
D O I
10.1109/eurocon.2019.8861929
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The research focus on the human microbiome is moving towards uncovering its association with the overall wellbeing and using this know ledge in personalized medicine and connected health. Driven by more affordable high-throughput sequencing, microbiome data generation rate has increased, enabling an efficient implementation of data-driven algorithms. This study evaluates the possibilities to identify clusters in a human microbiome data based on taxonomic profiles, relying on 24 different 13 diversity measures, individual and ensemble clustering approaches. The influence of ensemble creation techniques and parameter selection to the robustness and quality of consensus partition was explored. Furthermore, we have evaluated changes in the clustering performance after dimensionality reduction. The results indicate that careful selection of the algorithm parameters and ensemble design are needed to ensure the stable consensus partition. Reduction in the number of input features using kernel principal component analysis is accompanied with loss of discrimination potential.
引用
收藏
页数:6
相关论文
共 21 条
[1]  
[Anonymous], 2019, STAT MACH LEARN TECH
[2]  
Brdar S., 2016, INT M COMP INT METH, P199
[3]   Moving pictures of the human microbiome [J].
Caporaso, J. Gregory ;
Lauber, Christian L. ;
Costello, Elizabeth K. ;
Berg-Lyons, Donna ;
Gonzalez, Antonio ;
Stombaugh, Jesse ;
Knights, Dan ;
Gajer, Pawel ;
Ravel, Jacques ;
Fierer, Noah ;
Gordon, Jeffrey I. ;
Knight, Rob .
GENOME BIOLOGY, 2011, 12 (05)
[4]   Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample [J].
Caporaso, J. Gregory ;
Lauber, Christian L. ;
Walters, William A. ;
Berg-Lyons, Donna ;
Lozupone, Catherine A. ;
Turnbaugh, Peter J. ;
Fierer, Noah ;
Knight, Rob .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 :4516-4522
[5]   QIIME allows analysis of high-throughput community sequencing data [J].
Caporaso, J. Gregory ;
Kuczynski, Justin ;
Stombaugh, Jesse ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
Costello, Elizabeth K. ;
Fierer, Noah ;
Pena, Antonio Gonzalez ;
Goodrich, Julia K. ;
Gordon, Jeffrey I. ;
Huttley, Gavin A. ;
Kelley, Scott T. ;
Knights, Dan ;
Koenig, Jeremy E. ;
Ley, Ruth E. ;
Lozupone, Catherine A. ;
McDonald, Daniel ;
Muegge, Brian D. ;
Pirrung, Meg ;
Reeder, Jens ;
Sevinsky, Joel R. ;
Tumbaugh, Peter J. ;
Walters, William A. ;
Widmann, Jeremy ;
Yatsunenko, Tanya ;
Zaneveld, Jesse ;
Knight, Rob .
NATURE METHODS, 2010, 7 (05) :335-336
[6]   Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB [J].
DeSantis, T. Z. ;
Hugenholtz, P. ;
Larsen, N. ;
Rojas, M. ;
Brodie, E. L. ;
Keller, K. ;
Huber, T. ;
Dalevi, D. ;
Hu, P. ;
Andersen, G. L. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2006, 72 (07) :5069-5072
[7]  
Dhillon Inderjit S, 2004, P 10 ACM SIGKDD INT, P551, DOI DOI 10.1145/1014052.1014118
[8]   Search and clustering orders of magnitude faster than BLAST [J].
Edgar, Robert C. .
BIOINFORMATICS, 2010, 26 (19) :2460-2461
[9]   Combining multiple clusterings using evidence accumulation [J].
Fred, ALN ;
Jain, AK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (06) :835-850
[10]  
Fred ALN, 2013, ADV COMPUT VIS PATT, P85, DOI 10.1007/978-1-4471-5628-4_5