MetaFIND: A feature analysis tool for metabolomics data

被引:27
作者
Bryan, Kenneth [1 ]
Brennan, Lorraine [2 ]
Cunningham, Padraig [1 ]
机构
[1] Univ Coll Dublin, CASL, Dublin, Ireland
[2] Univ Coll Dublin, Conway Inst, Sch Agr Food Sci & Vet Med, Dublin, Ireland
基金
爱尔兰科学基金会;
关键词
D O I
10.1186/1471-2105-9-470
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Metabolomics, or metabonomics, refers to the quantitative analysis of all metabolites present within a biological sample and is generally carried out using NMR spectroscopy or Mass Spectrometry. Such analysis produces a set of peaks, or features, indicative of the metabolic composition of the sample and may be used as a basis for sample classification. Feature selection may be employed to improve classification accuracy or aid model explanation by establishing a subset of class discriminating features. Factors such as experimental noise, choice of technique and threshold selection may adversely affect the set of selected features retrieved. Furthermore, the high dimensionality and multi-collinearity inherent within metabolomics data may exacerbate discrepancies between the set of features retrieved and those required to provide a complete explanation of metabolite signatures. Given these issues, the latter in particular, we present the MetaFIND application for 'post-feature selection' correlation analysis of metabolomics data. Results: In our evaluation we show how MetaFIND may be used to elucidate metabolite signatures from the set of features selected by diverse techniques over two metabolomics datasets. Importantly, we also show how MetaFIND may augment standard feature selection and aid the discovery of additional significant features, including those which represent novel class discriminating metabolites. MetaFIND also supports the discovery of higher level metabolite correlations. Conclusion: Standard feature selection techniques may fail to capture the full set of relevant features in the case of high dimensional, multi-collinear metabolomics data. We show that the MetaFIND 'post-feature selection' analysis tool may aid metabolite signature elucidation, feature discovery and inference of metabolic correlations.
引用
收藏
页数:13
相关论文
共 25 条
[1]  
[Anonymous], 1966, Multivariate Analysis
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   OPLS discriminant analysis:: combining the strengths of PLS-DA and SIMCA classification [J].
Bylesjo, Max ;
Rantalainen, Mattias ;
Cloarec, Olivier ;
Nicholson, Jeremy K. ;
Holmes, Elaine ;
Trygg, Johan .
JOURNAL OF CHEMOMETRICS, 2006, 20 (8-10) :341-351
[4]   Statistical total correlation spectroscopy:: An exploratory approach for latent biomarker identification from metabolic 1H NMR data sets [J].
Cloarec, O ;
Dumas, ME ;
Craig, A ;
Barton, RH ;
Trygg, J ;
Hudson, J ;
Blancher, C ;
Gauguier, D ;
Lindon, JC ;
Holmes, E ;
Nicholson, J .
ANALYTICAL CHEMISTRY, 2005, 77 (05) :1282-1289
[5]  
Fan XH, 2005, P ANN INT IEEE EMBS, P6081
[6]   Combining genomics, metabolome analysis, and biochemical modelling to understand metabolic networks [J].
Fiehn, O .
COMPARATIVE AND FUNCTIONAL GENOMICS, 2001, 2 (03) :155-168
[7]   RAPID ASSESSMENT OF THE ADULTERATION OF VIRGIN OLIVE OILS BY OTHER SEED OILS USING PYROLYSIS MASS-SPECTROMETRY AND ARTIFICIAL NEURAL NETWORKS [J].
GOODACRE, R ;
KELL, DB ;
BIANCHI, G .
JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE, 1993, 63 (03) :297-307
[8]   Accountants as sources of business advice for small firms [J].
Gooderham, PN ;
Tobiassen, A ;
Doving, E ;
Nordhaug, O .
INTERNATIONAL SMALL BUSINESS JOURNAL, 2004, 22 (01) :5-22
[9]   Bagged K-means clustering of metabolome data [J].
Hageman, J. A. ;
van den Berg, R. A. ;
Westerhuis, J. A. ;
Hoefsloot, H. C. J. ;
Smilde, A. K. .
CRITICAL REVIEWS IN ANALYTICAL CHEMISTRY, 2006, 36 (3-4) :211-220
[10]   Probing latent biomarker signatures and in vivo pathway activity in experimental disease states via statistical total correlation spectroscopy (STOCSY) of biofluids:: Application to HgCl2 toxicity [J].
Holmes, E. ;
Cloarec, O. ;
Nicholson, J. K. .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (06) :1313-1320