Variable selection using support vector regression and random forests: A comparative study

被引:20
|
作者
Ben Ishak, Anis [1 ]
机构
[1] Univ Tunis, ISGT, BESTMOD LR99ES04, Le Bardo 2000, Tunisia
关键词
Variable importance score; variable selection; support vector regression; random forests; nonlinearity; stepwise algorithm; curse of dimensionality; selection bias; DIHYDROFOLATE-REDUCTASE; NEURAL-NETWORKS; GENE SELECTION; CLASSIFICATION; INHIBITION; BOUNDS;
D O I
10.3233/IDA-150795
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Variable selection is crucial for improving interpretation quality and forecasting accuracy. To this end, it is very interesting to choose an effective dimension reduction technique suitable for processing data according to their specificity and characteristics. In this paper, the problem of variable selection for linear and nonlinear regression is deeply investigated. The curse of dimensionality issue is also addressed. An intensive comparative study is performed between Support Vector Regression (SVR) and Random Forests (RF) for the purpose of variable importance assessment then for variable selection. The main contribution of this work is twofold: to expose some experimental insights about the efficiency of variable ranking and selection based on SVR and on RF, and to provide a benchmark study that helps researchers to choose the appropriate method for their data. Experiments on simulated and real-world datasets have been carried out. Results show that the SVR score. Ga is recommended for variable ranking in linear situations whereas the RF score is preferable in nonlinear cases. Moreover, we found that RF models are more efficient for selecting variables especially when used with an external score of importance.
引用
收藏
页码:83 / 104
页数:22
相关论文
共 50 条
  • [11] A comparison of random forests, boosting and support vector machines for genomic selection
    Joseph O Ogutu
    Hans-Peter Piepho
    Torben Schulz-Streeck
    BMC Proceedings, 5 (Suppl 3)
  • [12] SUBSET SELECTION IN NONLINEAR POISSON REGRESSION USING SUPPORT VECTOR REGRESSION : A SIMULATION STUDY
    Desai, S. S.
    Kashid, D. N.
    Sakate, D. M.
    INTERNATIONAL JOURNAL OF AGRICULTURAL AND STATISTICAL SCIENCES, 2018, 14 (01): : 13 - 22
  • [13] Comparing Support Vector Regression and Random Forests for Predicting Malaria Incidence in Mozambique
    Zacarias, Orlando P.
    Bostrom, Henrik
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2013, : 217 - 221
  • [14] Comparative study of support vector machines and random forests machine learning algorithms on credit operation
    Teles, Germanno
    Rodrigues, Joel J. P. C.
    Rabelo, Ricardo A. L.
    Kozlov, Sergei A.
    SOFTWARE-PRACTICE & EXPERIENCE, 2021, 51 (12): : 2492 - 2500
  • [15] RANDOM FOREST AND SUPPORT VECTOR MACHINE ON FEATURES SELECTION FOR REGRESSION ANALYSIS
    Dewi, Christine
    Chen, Rung-Ching
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (06): : 2027 - 2037
  • [16] Variable selection in support vector regression using angular search algorithm and variance inflation factor
    Folli, Gabriely S.
    Nascimento, Marcia H. C.
    de Paulo, Ellisson H.
    da Cunha, Pedro H. P.
    Romao, Wanderson
    Filgueiras, Paulo R.
    JOURNAL OF CHEMOMETRICS, 2020, 34 (12)
  • [17] VSURF: An R Package for Variable Selection Using Random Forests
    Genuer, Robin
    Poggi, Jean-Michel
    Tuleau-Malot, Christine
    R JOURNAL, 2015, 7 (02): : 19 - 33
  • [18] Variable selection by Random Forests using data with missing values
    Hapfelmeier, A.
    Ulm, K.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 80 : 129 - 139
  • [19] Causal Random Forests Model Using Instrumental Variable Quantile Regression
    Chen, Jau-er
    Hsiang, Chen-Wei
    ECONOMETRICS, 2019, 7 (04)
  • [20] An improved variable selection method for support vector regression in NIR spectral modeling
    Xu, Shu
    Lu, Bo
    Baldea, Michael
    Edgar, Thomas F.
    Nixon, Mark
    JOURNAL OF PROCESS CONTROL, 2018, 67 : 83 - 93