A modification of logistic regression with imbalanced data: F-measure-oriented Lasso-logistic regression

被引:0
|
作者
My, Bui T. T. [1 ,2 ]
Ta, Bao Q. [3 ]
机构
[1] Ho Chi Minh Univ Banking, Dept Math Econ, Ho Chi Minh City 7000, Vietnam
[2] UEH Univ, Coll Technol & Design, Fac Math & Stat, Ho Chi Minh City 7000, Vietnam
[3] Vietnam Natl Univ, Int Univ, Dept Math, Ho Chi Minh City 7000, Vietnam
来源
SCIENCEASIA | 2023年 / 49卷
关键词
cross-validation; F-measure; ridge; smote; CLASSIFICATION; PROBABILITIES;
D O I
10.2306/scienceasia1513-1874.2023.s003
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Logistic regression (LR) is one of the most popular classifiers. However, LR cannot perform effectively on imbalanced data. There are two approaches to imbalanced data for LR, including resampling techniques and modifications to the log-likelihood function. These approaches improve performance measures of LR in some cases, but their effectiveness is not robust in general. In this paper, we propose a classifier called F-measure-oriented Lasso-Logistic Regression (F-LLR) to deal with imbalanced data. The base learner of F-LLR is Lasso-Logistic regression (LLR) which imposes the prior on the magnitude of parameters by a hyper-parameter a.. The optimal a. is determined by an adjustment of the cross-validation procedure which aims for the highest F-measure instead of the highest accuracy. F-LLR addresses imbalanced data by the combination of Under-sampling and Synthetic Minority Oversampling Technique (SMOTE) selectively based on the scores of the training data. The empirical study shows that F-LLR increases F-measure and KS as compared with LLR and the traditional balanced methods, such as the resampling techniques (Random Under-sampling, Random Over-sampling, and SMOTE) and the modifications to log-likelihood function (Ridge and Weighted likelihood estimation).
引用
收藏
页码:68 / 77
页数:10
相关论文
共 50 条
  • [1] Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
    Wang, Hong
    Xu, Qingsong
    Zhou, Lifeng
    PLOS ONE, 2015, 10 (02):
  • [2] The group lasso for logistic regression
    Meier, Lukas
    van de Geer, Sara A.
    Buhlmann, Peter
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 53 - 71
  • [3] Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping
    Anhui Huang
    Shizhong Xu
    Xiaodong Cai
    BMC Genetics, 14
  • [4] Infinitely imbalanced logistic regression
    Owen, Art B.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 761 - 773
  • [5] Infinitely imbalanced logistic regression
    Owen, Art B.
    Journal of Machine Learning Research, 2007, 8 : 761 - 773
  • [6] Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping
    Huang, Anhui
    Xu, Shizhong
    Cai, Xiaodong
    BMC GENETICS, 2013, 13
  • [7] F-measure maximizing logistic regression
    Okabe, Masaaki
    Tsuchida, Jun
    Yadohisa, Hiroshi
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (05) : 2554 - 2564
  • [8] Study on the effect of occupational exposure on hypertension of steelworkers based on Lasso-Logistic regression model
    Chen, Jiaqi
    Zhao, Ziqi
    Zheng, Yizhan
    Hu, Jiaqi
    Zhu, Hongmin
    Wang, Huan
    Luo, Zhenghao
    Xuan, Xiaoqing
    Liu, Mingyue
    Wang, Nan
    Chen, Xinyang
    Li, Zheng
    Zhang, Shangmingzhu
    Zhang, Haoruo
    Li, Xiaoming
    Wu, Jianhui
    Xue, Ling
    PUBLIC HEALTH, 2025, 239 : 15 - 21
  • [9] Lasso-Logistic regression model for the identification of serum biomarkers of neurotoxicity induced by strychnos alkaloids
    Wang, Zhipeng
    Sun, Xiaoyang
    Wang, Binjie
    Shi, Shan
    Chen, Xiaohui
    TOXICOLOGY MECHANISMS AND METHODS, 2023, 33 (01) : 65 - 72
  • [10] Clinical characteristics and prognostic analysis of pediatric hemophagocytic lymphohistiocytosis using lasso-logistic regression
    Luo, Nandu
    Yang, Guangli
    Li, Baoli
    Zhang, Pingping
    Ma, Jinhua
    Chen, Yan
    Du, Zuochen
    Huang, Pei
    ANNALS OF HEMATOLOGY, 2024, 103 (12) : 5191 - 5200