Feature Selection in Multiple Linear Regression Problems with Fewer Samples Than Features

被引:2
|
作者
Schmude, Paul [1 ]
机构
[1] Sonovum AG, Perlickstr 5, D-04103 Leipzig, Germany
来源
BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2017, PT I | 2017年 / 10208卷
关键词
Overfitting; Feature selection; Filter method; Correlation; PCA; PLS; Forward selection; Genetic algorithm;
D O I
10.1007/978-3-319-56148-6_7
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Feature selection is of utmost importance when it comes to problems with large p (number of features) and small n (number of samples). Using too many features for a final model will most probably result in overfitting. There are many possibilities to select a subset of features to represent the data, this paper illustrates correlation filters, forward selection and genetic algorithm for feature selection and PCA and PLS as transformation methods. The methods are tested on three artificial data sets and one data set from an ultrasound study. Results show that no method excels for all problems and every method gives different insights into the data. The greedy style forward selection usually overfits and shows the largest difference between training and testing data, the PLS and PCA perform worse on the artificial data, but better for the ultrasound data.
引用
收藏
页码:85 / 95
页数:11
相关论文
共 38 条
  • [1] Feature Selection with Mutual Information for Regression Problems
    Sulaiman, Muhammad Aliyu
    Labadin, Jane
    2015 9TH INTERNATIONAL CONFERENCE ON IT IN ASIA (CITA), 2015,
  • [2] Carousel Greedy Algorithms for Feature Selection in Linear Regression
    Wang, Jiaqi
    Golden, Bruce
    Cerrone, Carmine
    ALGORITHMS, 2023, 16 (09)
  • [3] Feature selection for multiple binary classification problems
    Shapira, Y
    Gath, I
    PATTERN RECOGNITION LETTERS, 1999, 20 (08) : 823 - 832
  • [4] Feature selection of generalized extreme learning machine for regression problems
    Zhao, Yong-Ping
    Pan, Ying-Ting
    Song, Fang-Quan
    Sun, Liguo
    Chen, Ting-Hao
    NEUROCOMPUTING, 2018, 275 : 2810 - 2823
  • [5] Distributed Monte Carlo Feature Selection: Extracting Informative Features Out of Multidimensional Problems with Linear Speedup
    Krol, Lukasz
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2016, 2016, 613 : 463 - 474
  • [6] Bootstrapping multiple linear regression after variable selection
    Lasanthi C. R. Pelawa Watagoda
    David J. Olive
    Statistical Papers, 2021, 62 : 681 - 700
  • [7] Bootstrapping multiple linear regression after variable selection
    Pelawa Watagoda, Lasanthi C. R.
    Olive, David J.
    STATISTICAL PAPERS, 2021, 62 (02) : 681 - 700
  • [8] Better with fewer features: climate dynamics estimation for Van Lake basin using feature selection
    Önder Çoban
    Musa Esit
    Sercan Yalçın
    Ferhat Bozkurt
    Environmental Science and Pollution Research, 2025, 32 (10) : 5849 - 5873
  • [9] Linear regression-based feature selection for microarray data classification
    Hasan, Md Abid
    Hasan, Md Kamrul
    Mottalib, M. Abdul
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 11 (02) : 167 - 179
  • [10] Feature selection for regression problems based on the Morisita estimator of intrinsic dimension
    Golay, Jean
    Leuenberger, Michael
    Kanevski, Mikhail
    PATTERN RECOGNITION, 2017, 70 : 126 - 138