Performance of asymmetric links and correction methods for imbalanced data in binary regression

被引:10
作者
Huayanay, Alex de la Cruz [1 ]
Bazan, Jorge L. [2 ]
Cancho, Vicente G. [2 ]
Dey, Dipak K. [3 ]
机构
[1] USP UFSCar, Interinst Grad Stat, Sao Carlos, SP, Brazil
[2] Univ Sao Paulo, Dept Appl Math & Stat, Sao Carlos, SP, Brazil
[3] Univ Connecticut, Dept Stat, Mansfield, CT USA
基金
巴西圣保罗研究基金会;
关键词
Asymmetric link; binary regression; imbalanced data; predictive evaluation; quantile residuals; similarity measures; CROSS-VALIDATION; MODEL; PROBIT;
D O I
10.1080/00949655.2019.1593984
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In binary regression, imbalanced data result from the presence of values equal to zero (or one) in a proportion that is significantly greater than the corresponding real values of one (or zero). In this work, we evaluate two methods developed to deal with imbalanced data and compare them to the use of asymmetric links. The results based on simulation study show, that correction methods do not adequately correct bias in the estimation of regression coefficients and that the models with power links and reverse power considered produce better results for certain types of imbalanced data. Additionally, we present an application for imbalanced data, identifying the best model among the various ones proposed. The parameters are estimated using a Bayesian approach, considering the Hamiltonian Monte-Carlo method, utilizing the No-U-Turn Sampler algorithm and the comparisons of models were developed using different criteria for model comparison, predictive evaluation and quantile residuals.
引用
收藏
页码:1694 / 1714
页数:21
相关论文
共 50 条
  • [31] Learning from Imbalanced Data Using Methods of Sample Selection
    Chairi, Ikram
    Alaoui, Souad
    Lyhyaoui, Abdelouahid
    2012 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2012, : 256 - 259
  • [32] A Novel Imbalanced Data Classification Approach Based on Logistic Regression and Fisher Discriminant
    Shi, Baofeng
    Wang, Jing
    Qi, Junyan
    Cheng, Yanqiu
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [33] Entropy-Based Fuzzy Weighted Logistic Regression for Classifying Imbalanced Data
    Harumeka, Ajiwasesa
    Purnami, Santi Wulan
    Rahayu, Santi Puteri
    SOFT COMPUTING IN DATA SCIENCE, SCDS 2021, 2021, 1489 : 312 - 327
  • [34] Weighted logistic regression for large-scale imbalanced and rare events data
    Maalouf, Maher
    Siddiqi, Mohammad
    KNOWLEDGE-BASED SYSTEMS, 2014, 59 : 142 - 148
  • [35] Influence on smoothness in penalized likelihood regression for binary data
    Jernigan, R
    O'Connell, J
    COMPUTATIONAL STATISTICS, 2001, 16 (04) : 481 - 504
  • [36] Influence on Smoothness in Penalized Likelihood Regression for Binary Data
    Robert Jernigan
    Julie O’Connell
    Computational Statistics, 2001, 16 : 481 - 504
  • [37] A Note on Local Likelihood Regression for Binary Response Data
    Okumura, Hidenori
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2009, 38 (05) : 1019 - 1025
  • [38] A novel twin-support vector machine for binary classification to imbalanced data
    Li, Jingyi
    Chao, Shiwei
    DATA TECHNOLOGIES AND APPLICATIONS, 2023, 57 (03) : 385 - 396
  • [39] On assessing binary regression models based on ungrouped data
    Lu, Chunling
    Yang, Yuhong
    BIOMETRICS, 2019, 75 (01) : 5 - 12
  • [40] Performance Evaluation of Anomaly Detection in Imbalanced System Log Data
    Studiawan, Hudan
    Sohel, Ferdous
    PROCEEDINGS OF THE 2020 FOURTH WORLD CONFERENCE ON SMART TRENDS IN SYSTEMS, SECURITY AND SUSTAINABILITY (WORLDS4 2020), 2020, : 239 - 246