Performance of asymmetric links and correction methods for imbalanced data in binary regression

被引:10
作者
Huayanay, Alex de la Cruz [1 ]
Bazan, Jorge L. [2 ]
Cancho, Vicente G. [2 ]
Dey, Dipak K. [3 ]
机构
[1] USP UFSCar, Interinst Grad Stat, Sao Carlos, SP, Brazil
[2] Univ Sao Paulo, Dept Appl Math & Stat, Sao Carlos, SP, Brazil
[3] Univ Connecticut, Dept Stat, Mansfield, CT USA
基金
巴西圣保罗研究基金会;
关键词
Asymmetric link; binary regression; imbalanced data; predictive evaluation; quantile residuals; similarity measures; CROSS-VALIDATION; MODEL; PROBIT;
D O I
10.1080/00949655.2019.1593984
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In binary regression, imbalanced data result from the presence of values equal to zero (or one) in a proportion that is significantly greater than the corresponding real values of one (or zero). In this work, we evaluate two methods developed to deal with imbalanced data and compare them to the use of asymmetric links. The results based on simulation study show, that correction methods do not adequately correct bias in the estimation of regression coefficients and that the models with power links and reverse power considered produce better results for certain types of imbalanced data. Additionally, we present an application for imbalanced data, identifying the best model among the various ones proposed. The parameters are estimated using a Bayesian approach, considering the Hamiltonian Monte-Carlo method, utilizing the No-U-Turn Sampler algorithm and the comparisons of models were developed using different criteria for model comparison, predictive evaluation and quantile residuals.
引用
收藏
页码:1694 / 1714
页数:21
相关论文
共 50 条
  • [1] Flexible cloglog links for binomial regression models as an alternative for imbalanced medical data
    Alves, Jessica S. B.
    Bazan, Jorge L.
    Arellano-Valle, Reinaldo B.
    BIOMETRICAL JOURNAL, 2023, 65 (03)
  • [2] Predictive Performance of Logistic Regression for Imbalanced Data with Categorical Covariate
    Abd Rahman, Hezlin Aryani
    Wah, Yap Bee
    Huat, Ong Seng
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2020, 28 (04): : 1141 - 1161
  • [3] Predictive Performance of Logistic Regression for Imbalanced Data with Categorical Covariate
    Abd Rahman, Hezlin Aryani
    Wah, Yap Bee
    Huat, Ong Seng
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2021, 29 (01): : 181 - 197
  • [4] Performance of evaluation metrics for classification in imbalanced data
    Huayanay, Alex de la Cruz
    Bazan, Jorge L.
    Russo, Cibele M.
    COMPUTATIONAL STATISTICS, 2025, 40 (03) : 1447 - 1473
  • [5] Binary Classification with Imbalanced Data
    Chiang, Jyun-You
    Lio, Yuhlong
    Hsu, Chien-Ya
    Ho, Chia-Ling
    Tsai, Tzong-Ru
    ENTROPY, 2024, 26 (01)
  • [6] Oversampling techniques for imbalanced data in regression
    Belhaouari, Samir Brahim
    Islam, Ashhadul
    Kassoul, Khelil
    Al-Fuqaha, Ala
    Bouzerdoum, Abdesselam
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
  • [7] Longitudinal binary response models using alternative links for medical data
    Huayanay, Alex de la Cruz
    Bazan, Jorge L.
    Diniz, Carlos A. Ribeiro
    BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2023, 37 (02) : 365 - 392
  • [8] A Method for Analyzing the Performance Impact of Imbalanced Binary Data on Machine Learning Models
    Zheng, Ming
    Wang, Fei
    Hu, Xiaowen
    Miao, Yuhao
    Cao, Huo
    Tang, Mingjing
    AXIOMS, 2022, 11 (11)
  • [9] GEV regression with convex loss applied to imbalanced binary classification
    Zhang, Haolin
    Liu, Gongshen
    Pan, Li
    Meng, Kui
    Li, Jianhua
    2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016), 2016, : 532 - 537
  • [10] Online Asymmetric Active Learning with Imbalanced Data
    Zhang, Xiaoxuan
    Yang, Tianbao
    Srinivasan, Padmini
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 2055 - 2064