Ensemble with Divisive Bagging for Feature Selection in Big Data

被引:1
|
作者
Park, Yousung [1 ]
Kwon, Tae Yeon [2 ]
机构
[1] Korea Univ, Dept Stat, 145 Anam Ro, Seoul 02841, South Korea
[2] Hankuk Univ Foreign Studies, Dept Int Finance, 81 Oedae Ro, Yongin 17035, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Feature selection; Bagging; Voting system; Ensemble; Big data; Feature importance; C55; C52; C63; C80; C15; C51; P-VALUES; BIASED-ESTIMATION; REGRESSION; LASSO;
D O I
10.1007/s10614-024-10741-y
中图分类号
F [经济];
学科分类号
02 ;
摘要
We introduce Ensemble with Divisive Bagging (EDB), a new feature selection method in linear models, to address the excessive selection of features in big data due to deflated p-values. Extensive simulations show that EDB derives parsimonious models without loss of predictive performance compared to lasso, ridge, elastic-net, LARS, and FS. We also show that EDB estimates feature importance in linear models more accurately compared to Random Forest, XGBoost, and CatBoost. Additionally, we apply EDB to feature selection in models for house prices and loan defaults. Our findings highlight the advantages of EDB: (1) effectively addressing deflated p-values and preventing the inclusion of extraneous features; (2) ensuring unbiased coefficient estimation; (3) adaptability to various models relying on p-value-based inferences; (4) construction of statistically explainable models with feature attribution and importance by preserving inferences based on a linear model and p-values; and (5) allowing application to linear economic models without altering the previous functional form of the model.
引用
收藏
页数:34
相关论文
共 50 条
  • [1] Bagging Ensemble Selection
    Sun, Quan
    Pfahringer, Bernhard
    AI 2011: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 7106 : 251 - 260
  • [2] Heterogeneous Ensemble Feature Selection Model (HEFSM) for Big Data Analytics
    Priyadharsini M.
    Karuppasamy K.
    Computer Systems Science and Engineering, 2023, 45 (02): : 2187 - 2205
  • [3] Bagging and Feature Selection for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I, 2017, 10199 : 471 - 486
  • [4] Stable bagging feature selection on medical data
    Alelyani, Salem
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [5] Stable bagging feature selection on medical data
    Salem Alelyani
    Journal of Big Data, 8
  • [6] Ensemble classifier based big data classification with hybrid optimal feature selection
    Pamila, J. C. Miraclin Joyce
    Selvi, R. Senthamil
    Santhi, P.
    Nithya, T. M.
    ADVANCES IN ENGINEERING SOFTWARE, 2022, 173
  • [7] An Optimized Bagging Learning with Ensemble Feature Selection Method for URL Phishing Detection
    Ponni Ponnusamy
    Prabha Dhandayudam
    Journal of Electrical Engineering & Technology, 2024, 19 : 1881 - 1889
  • [8] An Optimized Bagging Learning with Ensemble Feature Selection Method for URL Phishing Detection
    Ponnusamy, Ponni
    Dhandayudam, Prabha
    JOURNAL OF ELECTRICAL ENGINEERING & TECHNOLOGY, 2024, 19 (03) : 1881 - 1889
  • [9] A STUDY ON FEATURE SELECTION IN BIG DATA
    Manikandan, R. P. S.
    Kalpana, A. M.
    2017 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2017,
  • [10] A Classifier Using Online Bagging Ensemble Method for Big Data Stream Learning
    Yanxia Lv
    Sancheng Peng
    Ying Yuan
    Cong Wang
    Pengfei Yin
    Jiemin Liu
    Cuirong Wang
    Tsinghua Science and Technology, 2019, (04) : 379 - 388