Bagged K-means clustering of metabolome data

被引:6
作者
Hageman, J. A.
van den Berg, R. A.
Westerhuis, J. A.
Hoefsloot, H. C. J.
Smilde, A. K.
机构
[1] Univ Amsterdam, SILS, NL-1018 WV Amsterdam, Netherlands
[2] TNO, Qual Life, NL-3700 AJ Zeist, Netherlands
关键词
clustering; metabolomics; correlation; resampling; bootstrap aggregating; bagging; perturbing metabolome models; validation;
D O I
10.1080/10408340600969916
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Clustering of metabolomics data can be hampered by noise originating from biological variation, physical sampling error and analytical error. Using data analysis methods which are not specially suited for dealing with noisy data will yield sub optimal solutions. Bootstrap aggregating (bagging) is a resampling technique that can deal with noise and improves accuracy. This paper demonstrates the possibilities for bagged clustering applied to metabolomics data. The metabolomics data used in this paper is computer-generated with the human red blood cell model. Perturbing this model can be done in several ways. In this paper, inhibition experiments are mimicked inhibiting enzyme activity to 10% of its original value. Comparing bagged K-means clustering to ordinary K-means, the number of metabolites switching clusters under the influence of heteroscedastic noise is lower if bagging is used. This favors bagged K-means above ordinary K-means clustering when dealing with noisy metabolomics data. A special validation scheme, independent of the addition of noise, has been devised to demonstrate the positive effects of bagging on clustering.
引用
收藏
页码:211 / 220
页数:10
相关论文
共 22 条
[1]  
[Anonymous], 2004, P 16 IEEE INT C TOOL
[2]  
[Anonymous], MODELING METABOLISM
[3]   A test case of correlation metric construction of a reaction pathway from measurements [J].
Arkin, A ;
Shen, PD ;
Ross, J .
SCIENCE, 1997, 277 (5330) :1275-1279
[4]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[5]   The origin of correlations in metabolomics data [J].
Camacho, Diogo ;
de la Fuente, Alberto ;
Mendes, Pedro .
METABOLOMICS, 2005, 1 (01) :53-63
[6]   Integrative biological analysis of the APOE*3-Leiden transgenic mouse [J].
Clish, CB ;
Davidov, E ;
Oresic, M ;
Plasterer, TN ;
Lavine, G ;
Londo, T ;
Meys, M ;
Snell, P ;
Stochaj, W ;
Adourian, A ;
Zhang, X ;
Morel, N ;
Neumann, E ;
Verheij, E ;
Vogels, JTWE ;
Havekes, LM ;
Afeyan, N ;
Regnier, F ;
Van Der Greef, J ;
Naylor, S .
OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2004, 8 (01) :3-13
[7]   Bagging to improve the accuracy of a clustering procedure [J].
Dudoit, S ;
Fridlyand, J .
BIOINFORMATICS, 2003, 19 (09) :1090-1099
[8]   Deciphering metabolic networks [J].
Fiehn, O ;
Weckwerth, W .
EUROPEAN JOURNAL OF BIOCHEMISTRY, 2003, 270 (04) :579-588
[9]   Metabolomics and systems biology: making sense of the soup [J].
Kell, DB .
CURRENT OPINION IN MICROBIOLOGY, 2004, 7 (03) :296-307
[10]   Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments [J].
Kerr, MK ;
Churchill, GA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (16) :8961-8965