A Partial Least Squares based algorithm for parsimonious variable selection

被引:72
|
作者
Mehmood, Tahir [1 ]
Martens, Harald [2 ]
Saebo, Solve [1 ]
Warringer, Jonas [2 ,3 ]
Snipen, Lars [1 ]
机构
[1] Norwegian Univ Life Sci, Dept Chem Biotechnol & Food Sci, Trondheim, Norway
[2] Norwegian Univ Life Sci, Ctr Integrat Genet CIGENE Anim & Aquacultural Sci, Trondheim, Norway
[3] Univ Gothenburg, Dept Cell & Mol Biol, Gothenburg, Sweden
来源
ALGORITHMS FOR MOLECULAR BIOLOGY | 2011年 / 6卷
关键词
NEAR-INFRARED SPECTROSCOPY; DIMENSIONAL GENOMIC DATA; SYNONYMOUS CODON USAGE; WAVELENGTH SELECTION; MULTIVARIATE CALIBRATION; BACTERIAL GENOME; PLS REGRESSION; ELIMINATION; LATENT; GENE;
D O I
10.1186/1748-7188-6-27
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems. Results: We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection. Conclusions: A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Variable Screening for Near Infrared (NIR) Spectroscopy Data Based on Ridge Partial Least Squares Regression
    Zhao, Naifei
    Xu, Qingsong
    Tang, Man-lai
    Wang, Hong
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2020, 23 (08) : 740 - 756
  • [32] Study on Soil Carbon Estimation by On-the-Go Near-Infrared Spectra and Partial Least Squares Regression with Variable Selection
    Shen Zhang-quan
    Lu Bi-hui
    Shan Ying-jie
    Xu Hong-wei
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2013, 33 (07) : 1775 - 1780
  • [33] A sparse partial least squares algorithm based on sure independence screening method
    Xu, Xiangnan
    Cheng, Kian-Kai
    Deng, Lingli
    Dong, Jiyang
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2017, 170 : 38 - 50
  • [34] Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry
    Broadhurst, D
    Goodacre, R
    Jones, A
    Rowland, JJ
    Kell, DB
    ANALYTICA CHIMICA ACTA, 1997, 348 (1-3) : 71 - 86
  • [35] Assessment of partial least-squares calibration and wavelength selection for complex near-infrared spectra
    McShane, MJ
    Cote, GL
    Spiegelman, CH
    APPLIED SPECTROSCOPY, 1998, 52 (06) : 878 - 884
  • [36] Optimization of Parameter Selection for Partial Least Squares Model Development
    Zhao, Na
    Wu, Zhi-sheng
    Zhang, Qiao
    Shi, Xin-yuan
    Ma, Qun
    Qiao, Yan-jiang
    SCIENTIFIC REPORTS, 2015, 5
  • [37] New classical least-squares/partial least-squares hybrid algorithm for spectral analyses
    Haaland, DM
    Melgaard, DK
    APPLIED SPECTROSCOPY, 2001, 55 (01) : 1 - 8
  • [38] The Characteristic Spectral Selection Method Based on Forward and Backward Interval Partial Least Squares
    Qu Fang-fang
    Ren Dong
    Hou Jin-jian
    Zhang Zhong
    Lu An-xiang
    Wang Ji-hua
    Xu Hong-lei
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2016, 36 (02) : 593 - 598
  • [39] Interval partial least squares and moving window partial least squares in determining the enantiomeric composition of tryptophan using UV-Vis spectroscopy
    Jiao, Long
    Bing, Shan
    Zhang, Xiaofeng
    Li, Hua
    JOURNAL OF THE SERBIAN CHEMICAL SOCIETY, 2016, 81 (02) : 209 - 218
  • [40] Genetic algorithm optimisation combined with partial least squares regression and mutual information variable selection procedures in near-infrared quantitative analysis of cotton-viscose textiles
    Durand, A.
    Devos, O.
    Ruckebusch, C.
    Huvenne, J. P.
    ANALYTICA CHIMICA ACTA, 2007, 595 (1-2) : 72 - 79