Variable selection using support vector regression and random forests: A comparative study

被引:20
|
作者
Ben Ishak, Anis [1 ]
机构
[1] Univ Tunis, ISGT, BESTMOD LR99ES04, Le Bardo 2000, Tunisia
关键词
Variable importance score; variable selection; support vector regression; random forests; nonlinearity; stepwise algorithm; curse of dimensionality; selection bias; DIHYDROFOLATE-REDUCTASE; NEURAL-NETWORKS; GENE SELECTION; CLASSIFICATION; INHIBITION; BOUNDS;
D O I
10.3233/IDA-150795
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Variable selection is crucial for improving interpretation quality and forecasting accuracy. To this end, it is very interesting to choose an effective dimension reduction technique suitable for processing data according to their specificity and characteristics. In this paper, the problem of variable selection for linear and nonlinear regression is deeply investigated. The curse of dimensionality issue is also addressed. An intensive comparative study is performed between Support Vector Regression (SVR) and Random Forests (RF) for the purpose of variable importance assessment then for variable selection. The main contribution of this work is twofold: to expose some experimental insights about the efficiency of variable ranking and selection based on SVR and on RF, and to provide a benchmark study that helps researchers to choose the appropriate method for their data. Experiments on simulated and real-world datasets have been carried out. Results show that the SVR score. Ga is recommended for variable ranking in linear situations whereas the RF score is preferable in nonlinear cases. Moreover, we found that RF models are more efficient for selecting variables especially when used with an external score of importance.
引用
收藏
页码:83 / 104
页数:22
相关论文
共 50 条
  • [1] Variable selection using random forests
    Genuer, Robin
    Poggi, Jean-Michel
    Tuleau-Malot, Christine
    PATTERN RECOGNITION LETTERS, 2010, 31 (14) : 2225 - 2236
  • [2] Variable selection using random forests
    Sandri, Marco
    Zuccolotto, Paola
    DATA ANALYSIS, CLASSIFICATION AND THE FORWARD SEARCH, 2006, : 263 - +
  • [3] SUPPORT VECTOR REGRESSION WITH RANDOM OUTPUT VARIABLE AND PROBAILISTIC CONSTRAINTS
    Abaszade, M.
    Effati, S.
    IRANIAN JOURNAL OF FUZZY SYSTEMS, 2017, 14 (01): : 43 - 60
  • [4] Variable selection by permutation applied in support vector regression models
    da Cunha, Pedro H. P.
    de Paulo, Ellisson H.
    Folli, Gabriely Silveira
    Nascimento, Marcia H. C.
    Moro, Mariana K.
    Filgueiras, Paulo R.
    JOURNAL OF CHEMOMETRICS, 2022, 36 (10)
  • [5] Data selection using support vector regression
    Richman, Michael B.
    Leslie, Lance M.
    Trafalis, Theodore B.
    Mansouri, Hicham
    ADVANCES IN ATMOSPHERIC SCIENCES, 2015, 32 (03) : 277 - 286
  • [6] Data Selection Using Support Vector Regression
    Michael B.RICHMAN
    Lance M.LESLIE
    Theodore B.TRAFALIS
    Hicham MANSOURI
    AdvancesinAtmosphericSciences, 2015, 32 (03) : 277 - 286
  • [7] Data selection using support vector regression
    Michael B. Richman
    Lance M. Leslie
    Theodore B. Trafalis
    Hicham Mansouri
    Advances in Atmospheric Sciences, 2015, 32 : 277 - 286
  • [8] A new variable selection approach using Random Forests
    Hapfelmeier, A.
    Ulm, K.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2013, 60 : 50 - 69
  • [9] A comparison of Gaussian process regression, random forests and support vector regression for burn severity assessment in diseased forests
    Hultquist, Carolynne
    Chen, Gang
    Zhao, Kaiguang
    REMOTE SENSING LETTERS, 2014, 5 (08) : 723 - 732
  • [10] Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting
    Sanchez, Juan Carlos Moreno
    Mesa, Hector Gabriel Acosta
    Espinosa, Adrian Trueba
    Castilla, Sergio Ruiz
    Lamont, Farid Garcia
    SMART AGRICULTURAL TECHNOLOGY, 2025, 10