Feature selection algorithms in generalized additive models under concurvity

被引:7
作者
Kovacs, Laszlo [1 ]
机构
[1] Corvinus Univ Budapest, Dept Stat, Budapest, Hungary
关键词
Generalized additive model; Feature selection; Regularization; Boosting; Genetic algorithm; Harmony search algorithm; VARIABLE SELECTION; REGRESSION; PERFORMANCE; CONSISTENCY; LIKELIHOOD; STRENGTH;
D O I
10.1007/s00180-022-01292-7
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, the properties of 10 different feature selection algorithms for generalized additive models (GAMs) are compared on one simulated and two real-world datasets under concurvity. Concurvity can be interpreted as a redundancy in the feature set of a GAM. Like multicollinearity in linear models, concurvity causes unstable parameter estimates in GAMs and makes the marginal effect of features harder interpret. Feature selection algorithms for GAMs can be separated into four clusters: stepwise, boosting, regularization and concurvity controlled methods. Our numerical results show that algorithms with no constraints on concurvity tend to select a large feature set, without significant improvements in predictive performance compared to a more parsimonious feature set. A large feature set is accompanied by harmful concurvity in the proposed models. To tackle the concurvity phenomenon, recent feature selection algorithms such as the mRMR and the HSIC-Lasso incorporated some constraints on concurvity in their objective function. However, these algorithms interpret concurvity as pairwise non-linear relationship between features, so they do not account for the case when a feature can be accurately estimated as a multivariate function of several other features. This is confirmed by our numerical results. Our own solution to the problem, a hybrid genetic-harmony search algorithm (HA) introduces constrains on multivariate concurvity directly. Due to this constraint, the HA proposes a small and not redundant feature set with predictive performance similar to that of models with far more features.
引用
收藏
页码:461 / 493
页数:33
相关论文
共 50 条
  • [11] Genetic algorithms for the selection of smoothing parameters in additive models
    Krause, R
    Tutz, G
    COMPUTATIONAL STATISTICS, 2006, 21 (01) : 9 - 31
  • [12] Genetic algorithms for the selection of smoothing parameters in additive models
    Rüdiger Krause
    Gerhard Tutz
    Computational Statistics, 2006, 21 : 9 - 31
  • [13] Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution
    Galligan, Marie C.
    Saldova, Radka
    Campbell, Matthew P.
    Rudd, Pauline M.
    Murphy, Thomas B.
    BMC BIOINFORMATICS, 2013, 14
  • [14] Generalized additive models with flexible response functions
    Spiegel, Elmar
    Kneib, Thomas
    Otto-Sobotka, Fabian
    STATISTICS AND COMPUTING, 2019, 29 (01) : 123 - 138
  • [15] Exponential-bound property of estimators and variable selection in generalized additive models
    Wang, Xiaoming
    Carriere, K. C.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2007, 36 (06) : 1105 - 1122
  • [16] Plots for Generalized Additive Models
    Olive, David J.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2013, 42 (18) : 2610 - 2628
  • [17] Generalized Sparse Additive Models
    Haris, Asad
    Simon, Noah
    Shojaie, Ali
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [18] Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure
    Pour, Ali Foroughi
    Dalton, Lori A.
    BMC BIOINFORMATICS, 2018, 19
  • [19] Bayesian variable selection in generalized additive partial linear models
    Banerjee, Sayantan
    Ghosal, Subhashis
    STAT, 2014, 3 (01): : 363 - 378
  • [20] Heuristic Algorithms for Feature Selection under Bayesian Models with Block-diagonal Covariance Structure
    Pour, Ali Foroughi
    Dalton, Lori A.
    ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 758 - 759