Background: In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems. Results: We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection. Conclusions: A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.
机构:
Xiamen Univ, Dept Elect Sci, Xiamen 361005, Peoples R ChinaXiamen Univ, Dept Elect Sci, Xiamen 361005, Peoples R China
Xu, Xiangnan
Cheng, Kian-Kai
论文数: 0引用数: 0
h-index: 0
机构:
Univ Teknol Malaysia, Dept Bioproc & Polymer Engn, Johor Baharu 81310, Malaysia
Univ Teknol Malaysia, Innovat Ctr Agritechnol, Johor Baharu 81310, MalaysiaXiamen Univ, Dept Elect Sci, Xiamen 361005, Peoples R China
Cheng, Kian-Kai
Deng, Lingli
论文数: 0引用数: 0
h-index: 0
机构:
East China Univ Technol, Dept Informat Engn, Nanchang 330013, Jiangxi, Peoples R ChinaXiamen Univ, Dept Elect Sci, Xiamen 361005, Peoples R China
Deng, Lingli
Dong, Jiyang
论文数: 0引用数: 0
h-index: 0
机构:
Xiamen Univ, Dept Elect Sci, Xiamen 361005, Peoples R ChinaXiamen Univ, Dept Elect Sci, Xiamen 361005, Peoples R China
机构:
Beijing Univ Chinese Med, Beijing 100102, Peoples R China
Beijing Key Lab Basic & Dev Res Chinese Med, Beijing 100102, Peoples R China
State Adm TCM, Key Lab TCM Informat Engineer, Beijing 100102, Peoples R ChinaBeijing Univ Chinese Med, Beijing 100102, Peoples R China
Zhao, Na
Wu, Zhi-sheng
论文数: 0引用数: 0
h-index: 0
机构:
Beijing Univ Chinese Med, Beijing 100102, Peoples R China
Beijing Key Lab Basic & Dev Res Chinese Med, Beijing 100102, Peoples R China
State Adm TCM, Key Lab TCM Informat Engineer, Beijing 100102, Peoples R ChinaBeijing Univ Chinese Med, Beijing 100102, Peoples R China
Wu, Zhi-sheng
Zhang, Qiao
论文数: 0引用数: 0
h-index: 0
机构:
Beijing Univ Chinese Med, Beijing 100102, Peoples R China
Beijing Key Lab Basic & Dev Res Chinese Med, Beijing 100102, Peoples R China
State Adm TCM, Key Lab TCM Informat Engineer, Beijing 100102, Peoples R ChinaBeijing Univ Chinese Med, Beijing 100102, Peoples R China
Zhang, Qiao
Shi, Xin-yuan
论文数: 0引用数: 0
h-index: 0
机构:
Beijing Univ Chinese Med, Beijing 100102, Peoples R China
Beijing Key Lab Basic & Dev Res Chinese Med, Beijing 100102, Peoples R China
State Adm TCM, Key Lab TCM Informat Engineer, Beijing 100102, Peoples R ChinaBeijing Univ Chinese Med, Beijing 100102, Peoples R China
Shi, Xin-yuan
Ma, Qun
论文数: 0引用数: 0
h-index: 0
机构:
Beijing Univ Chinese Med, Beijing 100102, Peoples R China
Beijing Key Lab Basic & Dev Res Chinese Med, Beijing 100102, Peoples R China
State Adm TCM, Key Lab TCM Informat Engineer, Beijing 100102, Peoples R ChinaBeijing Univ Chinese Med, Beijing 100102, Peoples R China
Ma, Qun
Qiao, Yan-jiang
论文数: 0引用数: 0
h-index: 0
机构:
Beijing Univ Chinese Med, Beijing 100102, Peoples R China
Beijing Key Lab Basic & Dev Res Chinese Med, Beijing 100102, Peoples R China
State Adm TCM, Key Lab TCM Informat Engineer, Beijing 100102, Peoples R ChinaBeijing Univ Chinese Med, Beijing 100102, Peoples R China
机构:
Three Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R ChinaThree Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R China
Qu Fang-fang
Ren Dong
论文数: 0引用数: 0
h-index: 0
机构:
Three Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R ChinaThree Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R China
Ren Dong
Hou Jin-jian
论文数: 0引用数: 0
h-index: 0
机构:
Three Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R China
Beijing Res Ctr Agr Stand & Testing, Beijing 100097, Peoples R ChinaThree Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R China
Hou Jin-jian
Zhang Zhong
论文数: 0引用数: 0
h-index: 0
机构:
Three Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R ChinaThree Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R China
Zhang Zhong
Lu An-xiang
论文数: 0引用数: 0
h-index: 0
机构:
Beijing Res Ctr Agr Stand & Testing, Beijing 100097, Peoples R ChinaThree Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R China
Lu An-xiang
Wang Ji-hua
论文数: 0引用数: 0
h-index: 0
机构:
Three Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R China
Beijing Res Ctr Agr Stand & Testing, Beijing 100097, Peoples R ChinaThree Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R China
Wang Ji-hua
Xu Hong-lei
论文数: 0引用数: 0
h-index: 0
机构:
Curtin Univ, Dept Math & Stat, Perth, WA 6845, AustraliaThree Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R China