A LARGE SCALE ANALYSIS OF LOGISTIC REGRESSION: ASYMPTOTIC PERFORMANCE AND NEW INSIGHTS

被引:0
|
作者
Mai, Xiaoyi [1 ,2 ]
Liao, Zhenyu [1 ,2 ]
Couillet, Romain [1 ,2 ]
机构
[1] Univ Paris Saclay, Cent Supelec, St Aubin, France
[2] Univ Grenoble Alpes, GIPSA Lab, Grenoble, France
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
High dimensional statistic; logistic regression; machine learning; random matrix theory;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Logistic regression, one of the most popular machine learning binary classification methods, has been long believed to be unbiased. In this paper, we consider the "hard" classification problem of separating high dimensional Gaussian vectors, where the data dimension p and the sample size n are both large. Based on recent advances in random matrix theory (RMT) and high dimensional statistics, we evaluate the asymptotic distribution of the logistic regression classifier and consequently, provide the associated classification performance. This brings new insights into the internal mechanism of logistic regression classifier, including a possible bias in the separating hyperplane, as well as on practical issues such as hyper-parameter tuning, thereby opening the door to novel RMT-inspired improvements.
引用
收藏
页码:3357 / 3361
页数:5
相关论文
共 50 条
  • [1] Large-Scale Sparse Logistic Regression
    Liu, Jun
    Chen, Jianhui
    Ye, Jieping
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 547 - 555
  • [2] Trust region Newton method for large-scale logistic regression
    Lin, Chih-Jen
    Weng, Ruby C.
    Keerthi, S. Sathiya
    JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 627 - 650
  • [3] High performance logistic regression for privacy-preserving genome analysis
    De Cock, Martine
    Dowsley, Rafael
    Nascimento, Anderson C. A.
    Railsback, Davis
    Shen, Jianwei
    Todoki, Ariel
    BMC MEDICAL GENOMICS, 2021, 14 (01)
  • [4] High performance logistic regression for privacy-preserving genome analysis
    Martine De Cock
    Rafael Dowsley
    Anderson C. A. Nascimento
    Davis Railsback
    Jianwei Shen
    Ariel Todoki
    BMC Medical Genomics, 14
  • [5] ASYMPTOTIC EFFICIENCY OF LOGISTIC-REGRESSION RELATIVE TO LINEAR DISCRIMINANT-ANALYSIS
    RUIZVELASCO, S
    BIOMETRIKA, 1991, 78 (02) : 235 - 243
  • [6] Random forest versus logistic regression: a large-scale benchmark experiment
    Raphael Couronné
    Philipp Probst
    Anne-Laure Boulesteix
    BMC Bioinformatics, 19
  • [7] Random forest versus logistic regression: a large-scale benchmark experiment
    Couronne, Raphael
    Probst, Philipp
    Boulesteix, Anne-Laure
    BMC BIOINFORMATICS, 2018, 19
  • [8] Weighted logistic regression for large-scale imbalanced and rare events data
    Maalouf, Maher
    Siddiqi, Mohammad
    KNOWLEDGE-BASED SYSTEMS, 2014, 59 : 142 - 148
  • [9] A sparse version of the ridge logistic regression for large-scale text categorization
    Aseervatham, Sujeevan
    Antoniadis, Anestis
    Gaussier, Eric
    Burlet, Michel
    Denneulin, Yves
    PATTERN RECOGNITION LETTERS, 2011, 32 (02) : 101 - 106
  • [10] Comparative Performance Analysis of Random Forest and Logistic Regression Algorithms
    Malkocoglu, Ayse Berika Varol
    Malkocoglu, Sevki Utku
    2020 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2020, : 25 - 30