Exploring the analysis of structured metabolomics data

被引:25
作者
Verouden, Maikel P. H. [1 ]
Westerhuis, Johan A. [1 ]
van der Werf, Mariet J. [2 ]
Smilde, Age K. [1 ]
机构
[1] Univ Amsterdam, Swammerdam Inst Life Sci, NL-1018 WV Amsterdam, Netherlands
[2] TNO Qual Life, NL-3700 AJ Zeist, Netherlands
关键词
PCA; ASCA; Experimental design; Overfit; Microbial metabolomics; GAS CHROMATOGRAPHY/MASS SPECTROMETRY; NETWORK COMPONENT ANALYSIS; MICROBIAL METABOLOMICS; EXPERIMENTAL-DESIGN; ARABIDOPSIS-THALIANA; PLS REGRESSION; ASCA; STRESS;
D O I
10.1016/j.chemolab.2009.05.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In metabolomics research a large number of metabolites are measured that reflect the cellular state under the experimental conditions studied. In many occasions the experiments are performed according to an experimental design to make sure that sufficient variation is induced in the metabolite concentrations. However, as metabolomics is a holistic approach, also a large number of metabolites are measured in which no variation is induced by the experimental design. The presence of such non-induced metabolites hampers traditional data analysis methods as PCA to estimate the true model of the induced variation. The greediness of PCA leads to a clear overfit of the metabolomics data and can lead to a bad selection of important metabolites. In this paper we explore how, why and how severe PCA overfits data with an underlying experimental design. Recently new data analysis methods have been introduced that can use prior information of the system to reduce the overfit. We show that incorporation of prior knowledge of the system under investigation leads to a better estimation of the true underlying structure and to less overfit. The experimental design information together with ASCA is used to improve the analysis of metabolomics data. To show the improved model estimation property of ASCA a thorough simulation study is used and the results are extended to a microbial metabolomics batch fermentation study. The ASCA model is much less affected by the non-induced variation and measurement error than PCA, leading to a much better model of the induced variation. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:88 / 96
页数:9
相关论文
共 29 条
[1]  
[Anonymous], 2002, PRINCIPAL COMPONENT
[2]   Statistical experimental design and partial least squares regression analysis of biofluid metabonomic NMR and clinical chemistry data for screening of adverse drug effects [J].
Antti, H ;
Ebbels, TMD ;
Keun, HC ;
Bollard, ME ;
Beckonert, O ;
Lindon, JC ;
Nicholson, JK ;
Holmes, E .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2004, 73 (01) :139-149
[3]  
DEMING SN, DATA HANDLING SCI TE, V3, P181
[4]  
Deming SN, 1987, DATA HANDLING SCI TE, V3
[5]   Metabolomic investigation of the response of the model plant Arabidopsis thaliana to cadmium exposure:: Evaluation of data pretreatment methods for further statistical analyses [J].
Ducruix, Celine ;
Vailhen, Dorninique ;
Werner, Erwan ;
Fievet, Julie B. ;
Bourguignon, Jacques ;
Tabet, Jean-Claude ;
Ezan, Eric ;
Junot, Christophe .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2008, 91 (01) :67-77
[6]   Transcriptome network component analysis with limited microarray data [J].
Galbraith, Simon J. ;
Tran, Linh M. ;
Liao, James C. .
BIOINFORMATICS, 2006, 22 (15) :1886-1894
[7]   Design of experiments:: an efficient strategy to identify factors influencing extraction and derivatization of Arabidopsis thaliana samples in metabolomic studies with gas chromatography/mass spectrometry [J].
Gullberg, J ;
Jonsson, P ;
Nordström, A ;
Sjöström, M ;
Moritz, T .
ANALYTICAL BIOCHEMISTRY, 2004, 331 (02) :283-295
[8]   Analysis of variance-principal component analysis: A soft tool for proteomic discovery [J].
Harrington, PD ;
Vieira, NE ;
Espinoza, J ;
Nien, JK ;
Romero, R ;
Yergey, AL .
ANALYTICA CHIMICA ACTA, 2005, 544 (1-2) :118-127
[9]  
Hoskuldsson A, 1996, J CHEMOMETR, V10, P637, DOI 10.1002/(SICI)1099-128X(199609)10:5/6<637::AID-CEM452>3.0.CO
[10]  
2-E