PLS-Based and Regularization-Based Methods for the Selection of Relevant Variables in Non-targeted Metabolomics Data

被引:50
作者
Bujak, Renata [1 ]
Daghir-Wojtkowiak, Emilia [1 ]
Kaliszan, Roman [1 ]
Markuszewski, Michel J. [1 ]
机构
[1] Med Univ Gdansk, Dept Biopharmaceut & Pharmacodynam, Gdansk, Poland
关键词
statistical analysis; non-targeted metabolomics; mass spectrometry; orthogonal projections to latent structures-discriminant analysis; least absolute shrinkage and selection operator; ORTHOGONAL SIGNAL CORRECTION; PARTIAL LEAST-SQUARES; STATISTICAL-ANALYSIS; MODELS; METABOLISM; STRATEGIES; REGRESSION; SHRINKAGE;
D O I
10.3389/fmolb.2016.00035
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Non-targeted metabolomics constitutes a part of the systems biology and aims at determining numerous metabolites in complex biological samples. Datasets obtained in the non-targeted metabolomics studies are high-dimensional due to sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Therefore, a proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study). The orthogonal projections to latent structures-discriminant analysis (OPLS-DA) without and with multiple testing correction as well as the least absolute shrinkage and selection operator (LASSO) with bootstrapping, were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction selected 46 and 218 variables based on the VIP criteria using Pareto and UV scaling, respectively. For the PH study, 217 and 320 variables were selected based on the VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built after correcting for multiple testing, selected 4 and 19 variables as in terms of Pareto and UV scaling, respectively. For the PH study, 14 and 18 variables were selected based on the VIP criteria in terms of Pareto and UV scaling, respectively. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3 and 100%, respectively. In the light of PLS-based models, the larger the search space the higher the probability of developing models that fit the training data well with simultaneous poor predictive performance on the validation set. The LASSO offers potential improvements over standard linear regression due to the presence of the constrain, which promotes sparse solutions. This paper is the first one to date utilizing the LASSO penalized logistic regression in untargeted metabolomics studies.
引用
收藏
页数:10
相关论文
共 43 条
[1]   Analytical methods in untargeted metabolomics: state of the art in 2015 [J].
Alonso, Arnald ;
Marsal, Sara ;
Julia, Antonio .
FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2015, 3
[2]   Metabolism and bioenergetics in the right ventricle and pulmonary vasculature in pulmonary hypertension [J].
Archer, Stephen L. ;
Fang, Yong-Hu ;
Ryan, John J. ;
Piao, Lin .
PULMONARY CIRCULATION, 2013, 3 (01) :144-152
[3]   Metabolomic Profiling for Identification of Novel Potential Biomarkers in Cardiovascular Diseases [J].
Barderas, Maria G. ;
Laborde, Carlos M. ;
Posada, Maria ;
de la Cuesta, Fernando ;
Zubiri, Irene ;
Vivanco, Fernando ;
Alvarez-Llamas, Gloria .
JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2011,
[4]   Partial least squares for discrimination [J].
Barker, M ;
Rayens, W .
JOURNAL OF CHEMOMETRICS, 2003, 17 (03) :166-173
[5]   Bioactive sphingolipids: metabolism and function [J].
Bartke, Nana ;
Hannun, Yusuf A. .
JOURNAL OF LIPID RESEARCH, 2009, 50 :S91-S96
[6]   Statistical strategies for avoiding false discoveries in metabolomics and related experiments [J].
Broadhurst, David I. ;
Kell, Douglas B. .
METABOLOMICS, 2006, 2 (04) :171-196
[7]   Combination of LC-MS- and GC-MS-based Metabolomics to Study the Effect of Ozonated Autohemotherapy on Human Blood [J].
Ciborowski, Michal ;
Lipska, Alina ;
Godzien, Joanna ;
Ferrarini, Alessia ;
Korsak, Jolanta ;
Radziwon, Piotr ;
Tomasiak, Marian ;
Barbas, Coral .
JOURNAL OF PROTEOME RESEARCH, 2012, 11 (12) :6231-6241
[8]   Least absolute shrinkage and selection operator and dimensionality reduction techniques in quantitative structure retention relationship modeling of retention in hydrophilic interaction liquid chromatography [J].
Daghir-Wojtkowiak, Emilia ;
Wiczling, Pawel ;
Bocian, Szymon ;
Kubik, Lukasz ;
Koslinski, Piotr ;
Buszewski, Boguslaw ;
Kaliszan, Roman ;
Markuszewski, Michal Jan .
JOURNAL OF CHROMATOGRAPHY A, 2015, 1403 :54-62
[9]  
Dudley E, 2010, ADV PROTEIN CHEM STR, V80, P45, DOI [10.1016/B978-0-12-381264-3.00002-3, 10.1016/S1876-1623(10)80002-1]
[10]   Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry [J].
Dunn, Warwick B. ;
Broadhurst, David ;
Begley, Paul ;
Zelena, Eva ;
Francis-McIntyre, Sue ;
Anderson, Nadine ;
Brown, Marie ;
Knowles, Joshau D. ;
Halsall, Antony ;
Haselden, John N. ;
Nicholls, Andrew W. ;
Wilson, Ian D. ;
Kell, Douglas B. ;
Goodacre, Royston .
NATURE PROTOCOLS, 2011, 6 (07) :1060-1083