Comparison of some chemometric tools for metabonomics biomarker identification

被引:48
作者
Rousseau, Rejane [1 ]
Govaerts, Bernadette [1 ]
Verleysen, Michel [1 ,2 ]
Boulanger, Bruno
机构
[1] Catholic Univ Louvain, Inst Stat, B-1348 Louvain, Belgium
[2] Catholic Univ Louvain, Machine Learning Grp, DICE, B-3000 Louvain, Belgium
关键词
metabonomics; multivariate statistics; variable selection; biomarker identification; H-1 NMR spectroscopy;
D O I
10.1016/j.chemolab.2007.06.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
NMR-based metabonomics discovery approaches require statistical methods to extract, from large and complex spectral databases, biomarkers or biologically significant variables that best represent defined biological conditions. This paper explores the respective effectiveness of six multivariate methods: multiple hypotheses testing, supervised extensions of principal (PCA) and independent components analysis (ICA), discriminant partial least squares, linear logistic regression and classification trees. Each method has been adapted in order to provide a biomarker score for each zone of the spectrum. These scores aim at giving to the biologist indications on which metabolites of the analyzed biofluid are potentially affected by a stressor factor of interest (e.g. toxicity of a drug, presence of a given disease or therapeutic effect of a drug). The applications of the six methods to samples of 60 and 200 spectra issued from a semi-artificial database allowed to evaluate their respective properties. In particular, their sensitivities and false discovery rates (FDR) are illustrated through receiver operating characteristics curves (ROC) and the resulting identifications are used to show their specificities and relative advantages. The paper recommends to discard two methods for biomarker identification: the PCA showing a general low efficiency and the CART which is very sensitive to noise. The other 4 methods give promising results, each having its own specificities. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:54 / 66
页数:13
相关论文
共 24 条
[1]   NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION [J].
AKAIKE, H .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) :716-723
[2]  
[Anonymous], 1989, Applied Logistic Regression
[3]   Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173
[4]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[5]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   SIMPLS - AN ALTERNATIVE APPROACH TO PARTIAL LEAST-SQUARES REGRESSION [J].
DEJONG, S .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1993, 18 (03) :251-263
[8]  
Egan JP., 1975, Signal Detection Theory and ROC Analysis
[9]   Classification of microarray data with penalized logistic regression [J].
Eilers, PHC ;
Boer, JM ;
van Ommen, GJ ;
van Houwelingen, HC .
MICROARRAYS: OPTICAL TECHNOLOGIES AND INFORMATICS, 2001, 4266 :187-198
[10]   A comparative analysis of methods for pruning decision trees [J].
Esposito, F ;
Malerba, D ;
Semeraro, G .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (05) :476-491