Performance of asymmetric links and correction methods for imbalanced data in binary regression

被引:10
作者
Huayanay, Alex de la Cruz [1 ]
Bazan, Jorge L. [2 ]
Cancho, Vicente G. [2 ]
Dey, Dipak K. [3 ]
机构
[1] USP UFSCar, Interinst Grad Stat, Sao Carlos, SP, Brazil
[2] Univ Sao Paulo, Dept Appl Math & Stat, Sao Carlos, SP, Brazil
[3] Univ Connecticut, Dept Stat, Mansfield, CT USA
基金
巴西圣保罗研究基金会;
关键词
Asymmetric link; binary regression; imbalanced data; predictive evaluation; quantile residuals; similarity measures; CROSS-VALIDATION; MODEL; PROBIT;
D O I
10.1080/00949655.2019.1593984
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In binary regression, imbalanced data result from the presence of values equal to zero (or one) in a proportion that is significantly greater than the corresponding real values of one (or zero). In this work, we evaluate two methods developed to deal with imbalanced data and compare them to the use of asymmetric links. The results based on simulation study show, that correction methods do not adequately correct bias in the estimation of regression coefficients and that the models with power links and reverse power considered produce better results for certain types of imbalanced data. Additionally, we present an application for imbalanced data, identifying the best model among the various ones proposed. The parameters are estimated using a Bayesian approach, considering the Hamiltonian Monte-Carlo method, utilizing the No-U-Turn Sampler algorithm and the comparisons of models were developed using different criteria for model comparison, predictive evaluation and quantile residuals.
引用
收藏
页码:1694 / 1714
页数:21
相关论文
共 50 条
  • [21] Bayesian skew-probit regression for binary response data
    Bazan, Jorge L.
    Romeo, Jose S.
    Rodrigues, Josemar
    BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2014, 28 (04) : 467 - 482
  • [22] Boosting methods for multi-class imbalanced data classification: an experimental review
    Tanha, Jafar
    Abdi, Yousef
    Samadi, Negin
    Razzaghi, Nazila
    Asadpour, Mohammad
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [23] Performance Measurement of Federated Learning on Imbalanced Data
    Sittijuk, Pramote
    Tamee, Kriengsuk
    2021 18TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE-2021), 2021,
  • [24] A Correction Method of a Base Classifier Applied to Imbalanced Data Classification
    Trajdos, Pawel
    Kurzynski, Marek
    COMPUTATIONAL SCIENCE - ICCS 2020, PT IV, 2020, 12140 : 88 - 102
  • [25] Anomaly Detection in Smart Grids with Imbalanced Data Methods
    Promper, Christian
    Engel, Dominik
    Green, Robert C., II
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017,
  • [26] Evaluation of Sampling Methods for Learning from Imbalanced Data
    Goel, Garima
    Maguire, Liam
    Li, Yuhua
    McLoone, Sean
    INTELLIGENT COMPUTING THEORIES, 2013, 7995 : 392 - 401
  • [27] Robust weighted kernel logistic regression in imbalanced and rare events data
    Maalouf, Maher
    Trafalis, Theodore B.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (01) : 168 - 183
  • [28] Dense fuzzy support vector machine to binary classification for imbalanced data
    Wang, Qingling
    Zheng, Jian
    Zhang, Wenjing
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (06) : 9643 - 9653
  • [29] Meta-learning for imbalanced data and classification ensemble in binary classification
    Lin, Sung-Chiang
    Chang, Yuan-chin I.
    Yang, Wei-Ning
    NEUROCOMPUTING, 2009, 73 (1-3) : 484 - 494
  • [30] A modification of logistic regression with imbalanced data: F-measure-oriented Lasso-logistic regression
    My, Bui T. T.
    Ta, Bao Q.
    SCIENCEASIA, 2023, 49 : 68 - 77