Feature Selection in Multiple Linear Regression Problems with Fewer Samples Than Features

被引:2
|
作者
Schmude, Paul [1 ]
机构
[1] Sonovum AG, Perlickstr 5, D-04103 Leipzig, Germany
来源
BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2017, PT I | 2017年 / 10208卷
关键词
Overfitting; Feature selection; Filter method; Correlation; PCA; PLS; Forward selection; Genetic algorithm;
D O I
10.1007/978-3-319-56148-6_7
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Feature selection is of utmost importance when it comes to problems with large p (number of features) and small n (number of samples). Using too many features for a final model will most probably result in overfitting. There are many possibilities to select a subset of features to represent the data, this paper illustrates correlation filters, forward selection and genetic algorithm for feature selection and PCA and PLS as transformation methods. The methods are tested on three artificial data sets and one data set from an ultrasound study. Results show that no method excels for all problems and every method gives different insights into the data. The greedy style forward selection usually overfits and shows the largest difference between training and testing data, the PLS and PCA perform worse on the artificial data, but better for the ultrasound data.
引用
收藏
页码:85 / 95
页数:11
相关论文
共 38 条
  • [31] Bayesian Network-Based Multi-objective Estimation of Distribution Algorithm for Feature Selection Tailored to Regression Problems
    Lopez, Jose A.
    Morales-Osorio, Felipe
    Lara, Maximiliano
    Velasco, Jonas
    Sanchez, Claudia N.
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2023, PT I, 2024, 14391 : 309 - 326
  • [32] Prediction of Pellet Durability Index in a commercial feed mill using multiple linear regression with variable selection and dimensionality reduction
    You, Jihao
    Tulpan, Dan
    Krziyzek, Cheryl
    Ellis, Jennifer L.
    JOURNAL OF ANIMAL SCIENCE, 2025, 103
  • [33] An Enhanced Zebra Optimization Algorithm With Multiple Strategies for Global Optimization and Feature Selection Problems: A Hepatocellular Carcinoma Case Study
    Ozbay, Feyza Altunbey
    IEEE ACCESS, 2025, 13 : 30036 - 30057
  • [34] Sequence-based Detection of DNA-binding Proteins using Multiple-View Features Allied with Feature Selection
    Zhou, Liling
    Song, Xiaoning
    Yu, Dong-Jun
    Sun, Jun
    MOLECULAR INFORMATICS, 2020, 39 (08)
  • [35] MLRMPA: An R package of multiple linear regression model population analysis based on a cluster sampling technique for variable selection of high dimensional data
    Xie, Meihong
    Deng, Fangfang
    Zhang, Xiaoyun
    Tian, Yueli
    Li, Peizhen
    Zhai, Honglin
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2014, 132 : 124 - 132
  • [36] High dimensional variable selection through group Lasso for multiple function-on-function linear regression: A case study in PM10 monitoring
    Evangelista, Adelia
    Acal, Christian
    Aguilera, Ana M.
    Sarra, Annalina
    Di Battista, Tonio
    Palermi, Sergio
    ENVIRONMETRICS, 2025, 36 (01)
  • [37] Credit Risk Assessment for Small and Microsized Enterprises Using Kernel Feature Selection-Based Multiple Criteria Linear Optimization Classifier: Evidence from China
    Wang, Yimeng
    Zhang, Yunqi
    COMPLEXITY, 2020, 2020
  • [38] Simultaneous determination of hydroquinone, resorcinol, phenol, m-cresol and p-cresol in untreated air samples using spectrofluorimetry and a custom multiple linear regression-successive projection algorithm
    Pistonesi, Marcelo F.
    Di Nezio, Maria S.
    Centurion, Maria E.
    Lista, Adriana G.
    Fragoso, Wallace D.
    Pontes, Marcio J. C.
    Araujo, Mario C. U.
    Fernandez Band, Beatriz S.
    TALANTA, 2010, 83 (02) : 320 - 323