Binary Classification with Imbalanced Data

被引:3
|
作者
Chiang, Jyun-You [1 ]
Lio, Yuhlong [2 ]
Hsu, Chien-Ya [3 ]
Ho, Chia-Ling [4 ]
Tsai, Tzong-Ru [3 ]
机构
[1] Southwestern Univ Finance & Econ, Sch Stat, Chengdu 611130, Peoples R China
[2] Univ South Dakota, Dept Math Sci, Vermillion, SD 57069 USA
[3] Tamkang Univ, Dept Stat, New Taipei 251301, Taiwan
[4] Tamkang Univ, Dept Risk Management & Insurance, New Taipei 251301, Taiwan
关键词
artificial neural network; expectation-maximization algorithm; Entropy; logistic regression; zero-inflated model; ZERO-INFLATED POISSON; BAYESIAN-ANALYSIS; NEURAL-NETWORKS; COUNT DATA; REGRESSION; MODEL;
D O I
10.3390/e26010015
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
When the binary response variable contains an excess of zero counts, the data are imbalanced. Imbalanced data cause trouble for binary classification. To simplify the numerical computation to obtain the maximum likelihood estimators of the zero-inflated Bernoulli (ZIBer) model parameters with imbalanced data, an expectation-maximization (EM) algorithm is proposed to derive the maximum likelihood estimates of the model parameters. The logistic regression model links the Bernoulli probabilities with the covariates in the ZIBer model, and the prediction performance among the ZIBer model, LightGBM, and artificial neural network (ANN) procedures is compared by Monte Carlo simulation. The results show that no method can dominate the other methods regarding predictive performance under the imbalanced data. The LightGBM and ZIBer models are more competitive than the ANN model for zero-inflated-imbalanced data sets.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Classification of Imbalanced Data Represented as Binary Features
    Mahmudah, Kunti Robiatul
    Indriani, Fatma
    Takemori-Sakai, Yukiko
    Iwata, Yasunori
    Wada, Takashi
    Satou, Kenji
    APPLIED SCIENCES-BASEL, 2021, 11 (17):
  • [2] An automated approach for binary classification on imbalanced data
    Vieira, Pedro Marques
    Rodrigues, Fatima
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (05) : 2747 - 2767
  • [3] An automated approach for binary classification on imbalanced data
    Pedro Marques Vieira
    Fátima Rodrigues
    Knowledge and Information Systems, 2024, 66 : 2747 - 2767
  • [4] A Hybrid Approach for Binary Classification of Imbalanced Data
    Tsai, Hsinhan
    Yang, Ta-Wei
    Wong, Wai-Man
    Kao, Han-Yi
    Chou, Cheng-Fu
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024, 23 (03)
  • [5] Binary classification for imbalanced data using data conformity mechanism
    Zheng, Jian
    Ren, Shumiao
    Zhang, Jingyue
    Wang, Shiyan
    Li, Lin
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [6] Meta-learning for imbalanced data and classification ensemble in binary classification
    Lin, Sung-Chiang
    Chang, Yuan-chin I.
    Yang, Wei-Ning
    NEUROCOMPUTING, 2009, 73 (1-3) : 484 - 494
  • [7] irrelevant attribute resistance approach to binary classification for imbalanced data
    Zheng, Jian
    Hu, Xin
    INFORMATION SCIENCES, 2024, 655
  • [8] Dense fuzzy support vector machine to binary classification for imbalanced data
    Wang, Qingling
    Zheng, Jian
    Zhang, Wenjing
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 9643 - 9653
  • [9] Comparison of resampling methods for dealing with imbalanced data in binary classification problem
    Park, Geun U.
    Jun, Inkyun G.
    KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (03) : 349 - 374
  • [10] Binary imbalanced data classification based on diversity oversampling by generative models
    Zhai, Junhai
    Qi, Jiaxing
    Shen, Chu
    INFORMATION SCIENCES, 2022, 585 : 313 - 343