LOCAL UNCERTAINTY SAMPLING FOR LARGE-SCALE MULTICLASS LOGISTIC REGRESSION

被引:18
|
作者
Han, Lei [1 ]
Tan, Kean Ming [2 ]
Yang, Ting [3 ]
Zhang, Tong [4 ,5 ]
机构
[1] Tencent Technol Shenzhen Co Ltd, Tencent AI Lab, Shenzhen, Peoples R China
[2] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
[3] Yelp Inc, San Francisco, CA USA
[4] Hong Kong Univ Sci & Technol, Dept Math, Hong Kong, Peoples R China
[5] Hong Kong Univ Sci & Technol, Dept Comp Sci, Hong Kong, Peoples R China
来源
ANNALS OF STATISTICS | 2020年 / 48卷 / 03期
关键词
Sampling; large-scale; multiclass logistic regression; MODELS;
D O I
10.1214/19-AOS1867
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be handled by available computational resources. We propose a general subsampling scheme for large-scale multiclass logistic regression and examine the variance of the resulting estimator. We show that asymptotically, the proposed method always achieves a smaller variance than that of the uniform random sampling. Moreover, when the classes are conditionally imbalanced, significant improvement over uniform sampling can be achieved. Empirical performance of the proposed method is evaluated and compared to other methods via both simulated and real-world datasets, and these results match and confirm our theoretical analysis.
引用
收藏
页码:1770 / 1788
页数:19
相关论文
共 50 条
  • [1] Large-Scale Sparse Logistic Regression
    Liu, Jun
    Chen, Jianhui
    Ye, Jieping
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 547 - 555
  • [2] Parallel Multiclass Logistic Regression for Classifying Large Scale Image Datasets
    Thanh-Nghi Do
    Poulet, Francois
    ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, 2015, 358 : 255 - 266
  • [3] Large-scale Bayesian logistic regression for text categorization
    Genkin, Alexander
    Lewis, David D.
    Madigan, David
    TECHNOMETRICS, 2007, 49 (03) : 291 - 304
  • [4] Sampling Lasso quantile regression for large-scale data
    Xu, Qifa
    Cai, Chao
    Jiang, Cuixia
    Sun, Fang
    Huang, Xue
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2018, 47 (01) : 92 - 114
  • [5] Trust region Newton method for large-scale logistic regression
    Department of Computer Science, National Taiwan University, Taipei 106, Taiwan
    不详
    不详
    J. Mach. Learn. Res., 2008, (627-650):
  • [6] Trust region Newton method for large-scale logistic regression
    Lin, Chih-Jen
    Weng, Ruby C.
    Keerthi, S. Sathiya
    JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 627 - 650
  • [7] Kernel Logistic Regression Algorithm for Large-Scale Data Classification
    Elbashir, Murtada
    Wang, Jianxin
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2015, 12 (05) : 465 - 472
  • [8] Random forest versus logistic regression: a large-scale benchmark experiment
    Couronne, Raphael
    Probst, Philipp
    Boulesteix, Anne-Laure
    BMC BIOINFORMATICS, 2018, 19
  • [9] L0 regularized logistic regression for large-scale data
    Ming, Hao
    Yang, Hu
    PATTERN RECOGNITION, 2024, 146
  • [10] Random forest versus logistic regression: a large-scale benchmark experiment
    Raphael Couronné
    Philipp Probst
    Anne-Laure Boulesteix
    BMC Bioinformatics, 19