Informative metabolites identification by variable importance analysis based on random variable combination

被引:0
作者
Yong-Huan Yun
Fu Liang
Bai-Chuan Deng
Guang-Bi Lai
Carlos M. Vicente Gonçalves
Hong-Mei Lu
Jun Yan
Xin Huang
Lun-Zhao Yi
Yi-Zeng Liang
机构
[1] Central South University,College of Chemistry and Chemical Engineering
[2] Heilongjiang University of Chinese Medicine,Department of Chemistry, Faculty of Mathematics and Natural Sciences
[3] University of Bergen,Yunnan Food Safety Research Institute
[4] Kunming University of Science and Technology,undefined
来源
Metabolomics | 2015年 / 11卷
关键词
Variable importance; Combination effect; Informative metabolites; Partial least squares-linear discriminant analysis;
D O I
暂无
中图分类号
学科分类号
摘要
Main target of metabolomics research is to reveal informative metabolites or biomarkers, which can be considered as a process of variable selection. So far, several methods, such as regression coefficient (RC), weights or variable importance in projection (VIP), have been widely used to assess the variable importance when building the partial least squares linear discriminant analysis PLS-LDA classification model. Then a set of metabolites can be selected by fixing a threshold value considering the rank of metabolites. However, they do not take into account the combination effect among a subset of variables, which will lead to bias within the results. In this work, a strategy named as variable importance analysis based on random variable combination (VIAVC), is developed for statistical assessment of variable importance. The framework of VIAVC includes mainly three parts: (1) employ a novel variables sampling method, called binary matrix resampling, which can guarantee that each variable has been selected with the same probability and generate a population of different variable combinations; (2) the importance of each variable is assessed by percent decrease or increase of the area under the receiver operating characteristic curve when the variable is excluded for the modeling by PLS-LDA; (3) iteratively retain and output the rank of the final remaining informative variables. The results of the applications to three metabolic datasets illustrate that VIAVC has better performance compared with other methods including RC, VIP and subwindow permutation analysis. The MATLAB code for implementing VIAVC is available in the supplemental materials.
引用
收藏
页码:1539 / 1551
页数:12
相关论文
共 139 条
  • [1] Anastassiou D(2007)Computational analysis of the synergy among multiple interacting genes Molecular Systems Biology 126 756-763
  • [2] Asp ML(2010)Evidence for the contribution of insulin resistance to the development of cachexia in tumor bearing mice International Journal of Cancer 45 5-32
  • [3] Tian M(2001)Random forests Machine Learning 18 516-525
  • [4] Wendel AA(2009)Dietary intake of ω-6 and ω-3 fatty acids and risk of colorectal cancer in a prospective cohort of U.S. men and women Cancer Epidemiology, Biomarkers and Prevention 139 4836-4845
  • [5] Belury MA(2014)A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling Analyst 140 1876-1885
  • [6] Breiman L(2015)A novel variable selection approach that iteratively optimizes variable space using weighted binary matrix sampling Analyst 93 17-26
  • [7] Daniel CR(2014)NMR metabolomics of human blood and urine in disease research Journal of Pharmaceutical and Biomedical Analysis 10 427-432
  • [8] Deng B-C(2007)Colon cancer therapy: New perspectives of nutritional manipulations using polyunsaturated fatty acids Current Opinion in Clinical Nutrition & Metabolic Care 7 25-34
  • [9] Yun Y-H(2010)Learning to predict cancer-associated skeletal muscle wasting from 1H-NMR profiles of urinary metabolites Metabolomics 129 76-86
  • [10] Liang Y-Z(2013)Assessing feature relevance in NPLS models by VIP Chemometrics and Intelligent Laboratory Systems 43 2468-2481