Feature Selection in Multiple Linear Regression Problems with Fewer Samples Than Features

被引:2
|
作者
Schmude, Paul [1 ]
机构
[1] Sonovum AG, Perlickstr 5, D-04103 Leipzig, Germany
来源
BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2017, PT I | 2017年 / 10208卷
关键词
Overfitting; Feature selection; Filter method; Correlation; PCA; PLS; Forward selection; Genetic algorithm;
D O I
10.1007/978-3-319-56148-6_7
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Feature selection is of utmost importance when it comes to problems with large p (number of features) and small n (number of samples). Using too many features for a final model will most probably result in overfitting. There are many possibilities to select a subset of features to represent the data, this paper illustrates correlation filters, forward selection and genetic algorithm for feature selection and PCA and PLS as transformation methods. The methods are tested on three artificial data sets and one data set from an ultrasound study. Results show that no method excels for all problems and every method gives different insights into the data. The greedy style forward selection usually overfits and shows the largest difference between training and testing data, the PLS and PCA perform worse on the artificial data, but better for the ultrasound data.
引用
收藏
页码:85 / 95
页数:11
相关论文
共 38 条
  • [21] Subset selection in multiple linear regression models: A hybrid of genetic and simulated annealing algorithms
    Orkcu, H. Hasan
    APPLIED MATHEMATICS AND COMPUTATION, 2013, 219 (23) : 11018 - 11028
  • [22] Correlation-based feature selection and classification via regression of segmented chromosomes using geometric features
    Tanvi Arora
    Renu Dhir
    Medical & Biological Engineering & Computing, 2017, 55 : 733 - 745
  • [23] Correlation-based feature selection and classification via regression of segmented chromosomes using geometric features
    Arora, Tanvi
    Dhir, Renu
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2017, 55 (05) : 733 - 745
  • [24] Multi-surrogate assisted multi-objective evolutionary algorithms for feature selection in regression and classification problems with time series data
    Espinosa, Raquel
    Jimenez, Fernando
    Palma, Jose
    INFORMATION SCIENCES, 2023, 622 : 1064 - 1091
  • [25] Wrapper feature selection based multiple logistic regression model for determinants analysis of residential electricity consumption
    Yu, Yili
    Wang, Bo
    Wang, Zheng
    Wang, Fei
    Liu, Liming
    2017 ASIAN CONFERENCE ON ENERGY, POWER AND TRANSPORTATION ELECTRIFICATION (ACEPT), 2017,
  • [26] A novel permission-based Android malware detection system using feature selection based on linear regression
    Durmuş Özkan Şahin
    Oğuz Emre Kural
    Sedat Akleylek
    Erdal Kılıç
    Neural Computing and Applications, 2023, 35 : 4903 - 4918
  • [27] Ground Resistance Estimation Using Feed-Forward Neural Networks, Linear Regression and Feature Selection Models
    Eleftheriadou, Theopi
    Ampazis, Nikos
    Androvitsaneas, Vasilios P.
    Gonos, Ioannis F.
    Dounias, Georgios
    Stathopulos, Ioannis A.
    ARTIFICIAL INTELLIGENCE: METHODS AND APPLICATIONS, 2014, 8445 : 418 - 429
  • [28] A novel permission-based Android malware detection system using feature selection based on linear regression
    Sahin, Durmus Ozkan
    Kural, Oguz Emre
    Akleylek, Sedat
    Kilic, Erdal
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (07) : 4903 - 4918
  • [29] Self-adaptive parameter and strategy based particle swarm optimization for large-scale feature selection problems with multiple classifiers
    Xue, Yu
    Tang, Tao
    Pang, Wei
    Liu, Alex X.
    APPLIED SOFT COMPUTING, 2020, 88
  • [30] Extended BIC for linear regression models with diverging number of relevant features and high or ultra-high feature spaces
    Luo, Shan
    Chen, Zehua
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2013, 143 (03) : 494 - 504