Variable selection using support vector regression and random forests: A comparative study

被引:21
作者
Ben Ishak, Anis [1 ]
机构
[1] Univ Tunis, ISGT, BESTMOD LR99ES04, Le Bardo 2000, Tunisia
关键词
Variable importance score; variable selection; support vector regression; random forests; nonlinearity; stepwise algorithm; curse of dimensionality; selection bias; DIHYDROFOLATE-REDUCTASE; NEURAL-NETWORKS; GENE SELECTION; CLASSIFICATION; INHIBITION; BOUNDS;
D O I
10.3233/IDA-150795
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Variable selection is crucial for improving interpretation quality and forecasting accuracy. To this end, it is very interesting to choose an effective dimension reduction technique suitable for processing data according to their specificity and characteristics. In this paper, the problem of variable selection for linear and nonlinear regression is deeply investigated. The curse of dimensionality issue is also addressed. An intensive comparative study is performed between Support Vector Regression (SVR) and Random Forests (RF) for the purpose of variable importance assessment then for variable selection. The main contribution of this work is twofold: to expose some experimental insights about the efficiency of variable ranking and selection based on SVR and on RF, and to provide a benchmark study that helps researchers to choose the appropriate method for their data. Experiments on simulated and real-world datasets have been carried out. Results show that the SVR score. Ga is recommended for variable ranking in linear situations whereas the RF score is preferable in nonlinear cases. Moreover, we found that RF models are more efficient for selecting variables especially when used with an external score of importance.
引用
收藏
页码:83 / 104
页数:22
相关论文
共 50 条
  • [41] Variable selection and uncertainty analysis of scale growth rate under pre-salt oil wells conditions using support vector regression
    Droguett, Enrique L.
    Lins, Isis D.
    Moura, Marcio C.
    Zio, Enrico
    Jacinto, Carlos M.
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART O-JOURNAL OF RISK AND RELIABILITY, 2015, 229 (04) : 319 - 326
  • [42] Variable selection for the linear support vector machine
    Zhu, Ji
    Zou, Hui
    TRENDS IN NEURAL COMPUTATION, 2007, 35 : 35 - +
  • [43] Comparative Study of Kriging and Support Vector Regression for Structural Engineering Applications
    Moustapha, Maliki
    Bourinet, Jean-Marc
    Guillaume, Benoit
    Sudret, Bruno
    ASCE-ASME JOURNAL OF RISK AND UNCERTAINTY IN ENGINEERING SYSTEMS PART A-CIVIL ENGINEERING, 2018, 4 (02):
  • [44] Support vector regression for porosity prediction in a heterogeneous reservoir: A comparative study
    Al-Anazi, A. F.
    Gates, I. D.
    COMPUTERS & GEOSCIENCES, 2010, 36 (12) : 1494 - 1503
  • [45] Raman spectroscopy combined with support vector regression and variable selection method for accurately predicting salmon fillets storage time
    Li, Peng
    Ma, Junchao
    Zhong, Nan
    OPTIK, 2021, 247
  • [46] Variable selection in multivariate linear regression with random predictors
    Mbina, Alban Mbina
    Nkiet, Guy Martial
    N'guessan, Assi
    SOUTH AFRICAN STATISTICAL JOURNAL, 2023, 57 (01) : 27 - 44
  • [47] Optimization of hyperparameters and feature selection for random forests and support vector machines by artificial bee colony algorithm
    Kondo H.
    Asanuma Y.
    Transactions of the Japanese Society for Artificial Intelligence, 2019, 34 (02):
  • [48] A new regularized least squares support vector regression for gene selection
    Pei-Chun Chen
    Su-Yun Huang
    Wei J Chen
    Chuhsing K Hsiao
    BMC Bioinformatics, 10
  • [49] Model selection of support vector regression using particle swarm optimization algorithm
    Yang, HZ
    Shao, XG
    Chen, G
    Ding, F
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES A-MATHEMATICAL ANALYSIS, 2006, 13 : 1417 - 1425