Improving logistic regression on the imbalanced data by a novel penalized log-likelihood function

被引:15
|
作者
Zhang, Lili [1 ]
Geisler, Trent [1 ]
Ray, Herman [2 ]
Xie, Ying [3 ]
机构
[1] Kennesaw State Univ, Analyt & Data Sci PhD Program, Kennesaw, GA 30144 USA
[2] Kennesaw State Univ, Analyt & Data Sci Inst, Kennesaw, GA 30144 USA
[3] Kennesaw State Univ, Dept Informat Technol, Kennesaw, GA 30144 USA
关键词
Logistic regression; binary classification; imbalanced data; maximum likelihood; penalized log-likelihood function; cost-sensitive; CLASSIFICATION; BINARY;
D O I
10.1080/02664763.2021.1939662
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Logistic regression is estimated by maximizing the log-likelihood objective function formulated under the assumption of maximizing the overall accuracy. That does not apply to the imbalanced data. The resulting models tend to be biased towards the majority class (i.e. non-event), which can bring great loss in practice. One strategy for mitigating such bias is to penalize the misclassification costs of observations differently in the log-likelihood function. Existing solutions require either hard hyperparameter estimating or high computational complexity. We propose a novel penalized log-likelihood function by including penalty weights as decision variables for observations in the minority class (i.e. event) and learning them from data along with model coefficients. In the experiments, the proposed logistic regression model is compared with the existing ones on the statistics of area under receiver operating characteristics (ROC) curve from 10 public datasets and 16 simulated datasets, as well as the training time. A detailed analysis is conducted on an imbalanced credit dataset to examine the estimated probability distributions, additional performance measurements (i.e. type I error and type II error) and model coefficients. The results demonstrate that both the discrimination ability and computation efficiency of logistic regression models are improved using the proposed log-likelihood function as the learning objective.
引用
收藏
页码:3257 / 3277
页数:21
相关论文
共 50 条
  • [1] Penalized Logistic Regression With HMM Log-Likelihood Regressors for Speech Recognition
    Birkenes, Oystein
    Matsui, Tomoko
    Tanabe, Kunio
    Siniscalchi, Sabato Marco
    Myrvoll, Tor Andre
    Johnsen, Magne Hallstein
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1440 - 1454
  • [2] A note on bimodality in the log-likelihood function for penalized spline mixed models
    Welham, S. J.
    Thompson, R.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (04) : 920 - 931
  • [3] Penalized log-likelihood estimation for partly linear transformation models with current status data
    Ma, SG
    Kosorok, MR
    ANNALS OF STATISTICS, 2005, 33 (05): : 2256 - 2290
  • [4] Approximate Bayesian logistic regression via penalized likelihood by data augmentation
    Discacciati, Andrea
    Orsini, Nicola
    Greenland, Sander
    Stata Journal, 2015, 15 (03): : 712 - 736
  • [5] Fisher Information Matrix for Generalized Poisson Regression: Evaluation of the Log-Likelihood Function
    Dinnullah, Riski Nur Istiqomah
    Abusini, Sobri
    Fitriani, Rahma
    Marjono
    Fayeldi, Trija
    Sumara, Rauzan
    INTERNATIONAL JOURNAL OF MATHEMATICS AND COMPUTER SCIENCE, 2024, 19 (04): : 933 - 939
  • [6] Human Pose Regression with Residual Log-likelihood Estimation
    Li, Jiefeng
    Bian, Siyuan
    Zeng, Ailing
    Wang, Can
    Pang, Bo
    Liu, Wentao
    Lu, Cewu
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11005 - 11014
  • [7] Improving predictive inference under covariate shift by weighting the log-likelihood function
    Shimodaira, H
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2000, 90 (02) : 227 - 244
  • [8] On tests for global maximum of the log-likelihood function
    Blatt, Doron
    Hero, Alfred O., III
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2007, 53 (07) : 2510 - 2525
  • [9] CONCAVITY OF BOX-COX LOG-LIKELIHOOD FUNCTION
    KOUIDER, E
    CHEN, HF
    STATISTICS & PROBABILITY LETTERS, 1995, 25 (02) : 171 - 175
  • [10] Estimating haplotype effects on dichotomous outcome for unphased genotype data using a weighted penalized log-likelihood approach
    Souverein, OW
    Zwinderman, AH
    Tanck, MWT
    HUMAN HEREDITY, 2006, 61 (02) : 104 - 110