Variable selection using support vector regression and random forests: A comparative study

被引:20
|
作者
Ben Ishak, Anis [1 ]
机构
[1] Univ Tunis, ISGT, BESTMOD LR99ES04, Le Bardo 2000, Tunisia
关键词
Variable importance score; variable selection; support vector regression; random forests; nonlinearity; stepwise algorithm; curse of dimensionality; selection bias; DIHYDROFOLATE-REDUCTASE; NEURAL-NETWORKS; GENE SELECTION; CLASSIFICATION; INHIBITION; BOUNDS;
D O I
10.3233/IDA-150795
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Variable selection is crucial for improving interpretation quality and forecasting accuracy. To this end, it is very interesting to choose an effective dimension reduction technique suitable for processing data according to their specificity and characteristics. In this paper, the problem of variable selection for linear and nonlinear regression is deeply investigated. The curse of dimensionality issue is also addressed. An intensive comparative study is performed between Support Vector Regression (SVR) and Random Forests (RF) for the purpose of variable importance assessment then for variable selection. The main contribution of this work is twofold: to expose some experimental insights about the efficiency of variable ranking and selection based on SVR and on RF, and to provide a benchmark study that helps researchers to choose the appropriate method for their data. Experiments on simulated and real-world datasets have been carried out. Results show that the SVR score. Ga is recommended for variable ranking in linear situations whereas the RF score is preferable in nonlinear cases. Moreover, we found that RF models are more efficient for selecting variables especially when used with an external score of importance.
引用
收藏
页码:83 / 104
页数:22
相关论文
共 50 条
  • [21] SEABED IMAGE SEGMENTATION USING RANDOM FORESTS AND SUPPORT VECTOR MACHINES
    Rimavicius, Tadas
    Gelzinis, Adas
    Vaiciukynas, Evaldas
    Olenin, Sergej
    Saskov, Aleksej
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND CONTROL TECHNOLOGIES, 2013, : 41 - 44
  • [22] Feature selection for support vector regression using a genetic algorithm
    Mckearnan, Shannon B.
    Vock, David M.
    Marai, G. Elisabeta
    Canahuate, Guadalupe
    Fuller, Clifton D.
    Wolfson, Julian
    BIOSTATISTICS, 2023, 24 (02) : 295 - 308
  • [23] Feature Selection Using Probabilistic Prediction of Support Vector Regression
    Yang, Jian-Bo
    Ong, Chong-Jin
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (06): : 954 - 962
  • [24] Hybrid Firefly based Simultaneous Gene Selection and Cancer Classification using Support Vector Machines and Random Forests
    Srivastava, Atulji
    Chakrabarti, Saurabh
    Das, Subrata
    Ghosh, Shameek
    Jayaraman, V. K.
    PROCEEDINGS OF SEVENTH INTERNATIONAL CONFERENCE ON BIO-INSPIRED COMPUTING: THEORIES AND APPLICATIONS (BIC-TA 2012), VOL 1, 2013, 201 : 485 - +
  • [25] A comparative study of mutual information-based input variable selection strategies for the displacement prediction of seepage-driven landslides using optimized support vector regression
    Ma, Junwei
    Wang, Yankun
    Niu, Xiaoxu
    Jiang, Sheng
    Liu, Zhiyang
    STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2022, 36 (10) : 3109 - 3129
  • [26] A comparative study of mutual information-based input variable selection strategies for the displacement prediction of seepage-driven landslides using optimized support vector regression
    Junwei Ma
    Yankun Wang
    Xiaoxu Niu
    Sheng Jiang
    Zhiyang Liu
    Stochastic Environmental Research and Risk Assessment, 2022, 36 : 3109 - 3129
  • [27] Variable Selection for Support Vector Machines
    Bierman, Surette
    Steel, Sarel
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2009, 38 (08) : 1640 - 1658
  • [28] Subpixel urban land cover estimation: Comparing Cubist, Random Forests, and support vector regression
    Walton, Jeffrey T.
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2008, 74 (10): : 1213 - 1222
  • [29] Sales forecasting of computer products based on variable selection scheme and support vector regression
    Lu, Chi-Jie
    NEUROCOMPUTING, 2014, 128 : 491 - 499
  • [30] A Comparative Analysis on Linear Regression and Support Vector Regression
    Kavitha, S.
    Varuna, S.
    Ramya, R.
    PROCEEDINGS OF 2016 ONLINE INTERNATIONAL CONFERENCE ON GREEN ENGINEERING AND TECHNOLOGIES (IC-GET), 2016,