A Partial Least Squares based algorithm for parsimonious variable selection

被引:72
|
作者
Mehmood, Tahir [1 ]
Martens, Harald [2 ]
Saebo, Solve [1 ]
Warringer, Jonas [2 ,3 ]
Snipen, Lars [1 ]
机构
[1] Norwegian Univ Life Sci, Dept Chem Biotechnol & Food Sci, Trondheim, Norway
[2] Norwegian Univ Life Sci, Ctr Integrat Genet CIGENE Anim & Aquacultural Sci, Trondheim, Norway
[3] Univ Gothenburg, Dept Cell & Mol Biol, Gothenburg, Sweden
来源
ALGORITHMS FOR MOLECULAR BIOLOGY | 2011年 / 6卷
关键词
NEAR-INFRARED SPECTROSCOPY; DIMENSIONAL GENOMIC DATA; SYNONYMOUS CODON USAGE; WAVELENGTH SELECTION; MULTIVARIATE CALIBRATION; BACTERIAL GENOME; PLS REGRESSION; ELIMINATION; LATENT; GENE;
D O I
10.1186/1748-7188-6-27
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems. Results: We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection. Conclusions: A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] A gradient descent boosting spectrum modeling method based on back interval partial least squares
    Ren, Dong
    Qu, Fangfang
    Lv, Ke
    Zhang, Zhong
    Xu, Honglei
    Wang, Xiangyu
    NEUROCOMPUTING, 2016, 171 : 1038 - 1046
  • [42] FEATURE SELECTION/VISUALISATION OF ADNI DATA WITH ITERATIVE PARTIAL LEAST SQUARES
    Wasterlid, Torbjorn
    Bai, Li
    2014 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIG DATA (CIBD), 2014, : 46 - 53
  • [43] Performance evaluation of variable selection methods coupled with partial least squares regression to determine the target component in solid samples
    Zhao, Na
    Wu, Zhisheng
    Wu, Chunying
    Wang, Shuyu
    Zhan, Xueyan
    JOURNAL OF NEAR INFRARED SPECTROSCOPY, 2022, 30 (04) : 171 - 178
  • [44] Variable selection for partial least-squares calibration of near-infrared data from orthogonally designed experiments
    Setarehdan, SK
    Soraghan, JJ
    Littlejohn, D
    Sadler, DA
    APPLIED SPECTROSCOPY, 2002, 56 (03) : 337 - 345
  • [45] Marginal Screening for Partial Least Squares Regression
    Zhao, Naifei
    Xu, Qingsong
    Wang, Hong
    IEEE ACCESS, 2017, 5 : 14047 - 14055
  • [46] Spectral variable selection based on least absolute shrinkage and selection operator with ridge-adding homotopy
    Li, Haoran
    Dai, Jisheng
    Xiao, Jianbo
    Zou, Xiaobo
    Chen, Tao
    Holmose, Melvin
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 221
  • [47] SIMULTANEOUS SPECTROPHOTOMETRIC DETERMINATION OF DIPHENHYDRAMINE, BENZONATATE, GUAIFENESIN AND PHENYLEPHRINE IN THEIR QUATERNARY MIXTURE USING PARTIAL LEAST SQUARES WITH AND WITHOUT GENETIC ALGORITHM AS A POWERFUL VARIABLE SELECTION PROCEDURE
    Darwish, H. W.
    Metwally, F. H.
    Bayoumi, A. El.
    DIGEST JOURNAL OF NANOMATERIALS AND BIOSTRUCTURES, 2014, 9 (04) : 1359 - 1372
  • [48] APPI(+)-FTICR mass spectrometry coupled to partial least squares with genetic algorithm variable selection for prediction of API gravity and CCR of crude oil and vacuum residues
    Palacio Lozano, Diana Catalina
    Armando Orrego-Ruiz, Jorge
    Cabanzo Hernandez, Rafael
    Enrique Guerrero, Jader
    Mejia-Ospino, Enrique
    FUEL, 2017, 193 : 39 - 44
  • [49] Partial discharge pattern recognition method based on variable predictive model-based class discriminate and partial least squares regression
    Zhu, Yongli
    Jia, Yafei
    Wang, Liuwang
    IET SCIENCE MEASUREMENT & TECHNOLOGY, 2016, 10 (07) : 737 - 744
  • [50] Vis-NIR spectrometric determination of Brix and sucrose in sugar production samples using kernel partial least squares with interval selection based on the successive projections algorithm
    de Almeida, Valber Elias
    Gomes, Adriano de Araujo
    de Sousa Fernandes, David Douglas
    Casimiro Goicoechea, Hector
    Harrop Galvao, Roberto Kawakami
    Ugulino Araujo, Mario Cesar
    TALANTA, 2018, 181 : 38 - 43