Reflections on univariate and multivariate analysis of metabolomics data

被引:467
作者
Saccenti, Edoardo [1 ,2 ]
Hoefsloot, Huub C. J. [1 ,2 ]
Smilde, Age K. [1 ,2 ]
Westerhuis, Johan A. [1 ,2 ]
Hendriks, Margriet M. W. B. [2 ,3 ]
机构
[1] Univ Amsterdam, Swammerdam Inst Life Sci, Biosyst Data Anal Grp, NL-1098 XH Amsterdam, Netherlands
[2] Netherlands Metabol Ctr, NL-2333 CL Leiden, Netherlands
[3] Leiden Acad Ctr Drug Res, NL-2333 CL Leiden, Netherlands
关键词
Univariate analysis; Multivariate analysis; Hypothesis testing; Multiple test correction; Overfitting; Consistency at large; NMR-BASED METABOLOMICS; STATISTICAL VALIDATION; DISCRIMINANT-ANALYSIS; SHRUNKEN CENTROIDS; POWERFUL APPROACH; FEATURE-SELECTION; HIGHER CRITICISM; GENE-EXPRESSION; DATA SETS; CLASSIFICATION;
D O I
10.1007/s11306-013-0598-6
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Metabolomics experiments usually result in a large quantity of data. Univariate and multivariate analysis techniques are routinely used to extract relevant information from the data with the aim of providing biological knowledge on the problem studied. Despite the fact that statistical tools like the t test, analysis of variance, principal component analysis, and partial least squares discriminant analysis constitute the backbone of the statistical part of the vast majority of metabolomics papers, it seems that many basic but rather fundamental questions are still often asked, like: Why do the results of univariate and multivariate analyses differ? Why apply univariate methods if you have already applied a multivariate method? Why if I do not see something univariately I see something multivariately? In the present paper we address some aspects of univariate and multivariate analysis, with the scope of clarifying in simple terms the main differences between the two approaches. Applications of the t test, analysis of variance, principal component analysis and partial least squares discriminant analysis will be shown on both real and simulated metabolomics data examples to provide an overview on fundamental aspects of univariate and multivariate methods.
引用
收藏
页码:361 / 374
页数:14
相关论文
共 64 条
[41]   Metabolic Changes in Urine during and after Pregnancy in a Large, Multiethnic Population-Based Cohort Study of Gestational Diabetes [J].
Sachse, Daniel ;
Sletner, Line ;
Morkrid, Kjersti ;
Jenum, Anne Karen ;
Birkeland, Kare I. ;
Rise, Frode ;
Piehler, Armin P. ;
Berg, Jens Petter .
PLOS ONE, 2012, 7 (12)
[42]   A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics [J].
Schäfer, J ;
Strimmer, K .
STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2005, 4 :1-30
[43]  
Schneeweiss H., 1993, CONSISTENCY LARGE MO
[44]   Pair-wise multicomparison and OPLS analyses of cold-acclimation phases in Siberian spruce [J].
Shiryaeva, Liudmila ;
Antti, Henrik ;
Schroder, Wolfgang P. ;
Strimbeck, Richard ;
Shiriaev, Anton S. .
METABOLOMICS, 2012, 8 (01) :S123-S130
[45]  
Sokal RR., 2012, BIOMETRY, V4rd
[46]   A direct approach to false discovery rates [J].
Storey, JD .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2002, 64 :479-498
[47]   Statistical significance for genomewide studies [J].
Storey, JD ;
Tibshirani, R .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (16) :9440-9445
[48]  
Szymanska E., 2011, METABOLOMICS S1, V8, P3
[49]   A lipidomic analysis approach to evaluate the response to cholesterol-lowering food intake [J].
Szymanska, Ewa ;
van Dorsten, Ferdinand A. ;
Troost, Jorne ;
Paliukhovich, Iryna ;
van Velzen, Ewoud J. J. ;
Hendriks, Margriet M. W. B. ;
Trautwein, Elke A. ;
van Duynhoven, John P. M. ;
Vreeken, Rob J. ;
Smilde, Age K. .
METABOLOMICS, 2012, 8 (05) :894-906
[50]   Quick and easy implementation of the Benjamini-Hochberg procedure for controlling the false positive rate in multiple comparisons [J].
Thissen, D ;
Steinberg, L ;
Kuang, D .
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2002, 27 (01) :77-83