Variable selection using support vector regression and random forests: A comparative study

被引:20
作者
Ben Ishak, Anis [1 ]
机构
[1] Univ Tunis, ISGT, BESTMOD LR99ES04, Le Bardo 2000, Tunisia
关键词
Variable importance score; variable selection; support vector regression; random forests; nonlinearity; stepwise algorithm; curse of dimensionality; selection bias; DIHYDROFOLATE-REDUCTASE; NEURAL-NETWORKS; GENE SELECTION; CLASSIFICATION; INHIBITION; BOUNDS;
D O I
10.3233/IDA-150795
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Variable selection is crucial for improving interpretation quality and forecasting accuracy. To this end, it is very interesting to choose an effective dimension reduction technique suitable for processing data according to their specificity and characteristics. In this paper, the problem of variable selection for linear and nonlinear regression is deeply investigated. The curse of dimensionality issue is also addressed. An intensive comparative study is performed between Support Vector Regression (SVR) and Random Forests (RF) for the purpose of variable importance assessment then for variable selection. The main contribution of this work is twofold: to expose some experimental insights about the efficiency of variable ranking and selection based on SVR and on RF, and to provide a benchmark study that helps researchers to choose the appropriate method for their data. Experiments on simulated and real-world datasets have been carried out. Results show that the SVR score. Ga is recommended for variable ranking in linear situations whereas the RF score is preferable in nonlinear cases. Moreover, we found that RF models are more efficient for selecting variables especially when used with an external score of importance.
引用
收藏
页码:83 / 104
页数:22
相关论文
共 50 条
  • [31] Unbiased split variable selection for random survival forests using maximally selected rank statistics
    Wright, Marvin N.
    Dankowski, Theresa
    Ziegler, Andreas
    STATISTICS IN MEDICINE, 2017, 36 (08) : 1272 - 1284
  • [32] Stochastic Support Vector Machine for Classifying and Regression of Random Variables
    Abaszade, Maryam
    Effati, Sohrab
    NEURAL PROCESSING LETTERS, 2018, 48 (01) : 1 - 29
  • [33] Comparative Study of Variable Selection Using Genetic Algorithm with Various Types of Chromosomes
    Chen Guo-Hua
    Lu Yao
    Xia Zhi-Ning
    CHINESE JOURNAL OF STRUCTURAL CHEMISTRY, 2010, 29 (09) : 1431 - 1437
  • [34] Prediction of Apnea of Prematurity in Neonates using Support Vector Machines and Random Forests
    Mago, Nikhit
    Srivastava, Shikhar
    Shirwaikar, Rudresh D.
    Acharya, Dinesh U.
    Lewis, Leslie Edward S.
    Shivakumar, M.
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 693 - 697
  • [35] SNPs selection using support vector regression and genetic algorithms in GWAS
    de Oliveira, Fabrzzio Conde
    Hasenclever Borges, Carlos Cristiano
    Almeida, Fernanda Nascimento
    e Silva, Fabyano Fonseca
    Verneque, Rui da Silva
    da Silva, Marcos Vinicius G. B.
    Arbex, Wagner
    BMC GENOMICS, 2014, 15
  • [36] SNPs selection using support vector regression and genetic algorithms in GWAS
    Fabrízzio Condé de Oliveira
    Carlos Cristiano Hasenclever Borges
    Fernanda Nascimento Almeida
    Fabyano Fonseca e Silva
    Rui da Silva Verneque
    Marcos Vinicius GB da Silva
    Wagner Arbex
    BMC Genomics, 15
  • [37] Forecasting Wind Speed using Support Vector Regression and Feature Selection
    Botha, Nicolene
    van der Walt, Christiaan M.
    2017 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS (PRASA-ROBMECH), 2017, : 181 - 186
  • [38] A comparative study of random forests and multiple linear regression in the prediction of landslide velocity
    Martin Krkač
    Sanja Bernat Gazibara
    Željko Arbanas
    Marin Sečanj
    Snježana Mihalić Arbanas
    Landslides, 2020, 17 : 2515 - 2531
  • [39] A Forecasting Methodology Using Support Vector Regression and Dynamic Feature Selection
    Guajardo, Jose
    Weber, Richard
    Miranda, Jaime
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2006, 5 (04) : 329 - 335
  • [40] Variable selection and uncertainty analysis of scale growth rate under pre-salt oil wells conditions using support vector regression
    Droguett, Enrique L.
    Lins, Isis D.
    Moura, Marcio C.
    Zio, Enrico
    Jacinto, Carlos M.
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART O-JOURNAL OF RISK AND RELIABILITY, 2015, 229 (04) : 319 - 326