Penalized robust estimators in sparse logistic regression

被引:6
作者
Bianco, Ana M. [1 ,2 ]
Boente, Graciela [3 ,4 ]
Chebi, Gonzalo [3 ,4 ]
机构
[1] Univ Buenos Aires, Fac Ciencias Exactas & Nat, Inst Calculo, Ciudad Univ,Pabellon 2, RA-1428 Buenos Aires, DF, Argentina
[2] Consejo Nacl Invest Cient & Tecn, Ciudad Univ,Pabellon 2, RA-1428 Buenos Aires, DF, Argentina
[3] Univ Buenos Aires, Fac Ciencias Exactas & Nat, Dept Matemat, Ciudad Univ,Pabellon 1, RA-1428 Buenos Aires, DF, Argentina
[4] Consejo Nacl Invest Cient & Tecn, Ciudad Univ,Pabellon 1, RA-1428 Buenos Aires, DF, Argentina
关键词
Logistic regression; Outliers; Penalty functions; Robust estimation; Sparse models; VARIABLE SELECTION; LIKELIHOOD; INFERENCE; LASSO;
D O I
10.1007/s11749-021-00792-w
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Sparse covariates are frequent in classification and regression problems where the task of variable selection is usually of interest. As it is well known, sparse statistical models correspond to situations where there are only a small number of nonzero parameters, and for that reason, they are much easier to interpret than dense ones. In this paper, we focus on the logistic regression model and our aim is to address robust and penalized estimation for the regression parameter. We introduce a family of penalized weighted M-type estimators for the logistic regression parameter that are stable against atypical data. We explore different penalization functions including the so-called Sign penalty. We provide a careful analysis of the estimators convergence rates as well as their variable selection capability and asymptotic distribution for fixed and random penalties. A robust cross-validation criterion is also proposed. Through a numerical study, we compare the finite sample performance of the classical and robust penalized estimators, under different contamination scenarios. The analysis of real datasets enables to investigate the stability of the penalized estimators in the presence of outliers.
引用
收藏
页码:563 / 594
页数:32
相关论文
共 34 条
[1]  
[Anonymous], 2015, Modern Nonparametric, Robust and Multivariate Methods: Festschrift in Honour of Hannu Oja, DOI 10.1007/978-3-319-22404-6_19
[2]   Robust and consistent variable selection in high-dimensional generalized linear models [J].
Avella-Medina, Marco ;
Ronchetti, Elvezio .
BIOMETRIKA, 2018, 105 (01) :31-44
[3]   A Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimator [J].
Basu, Ayanendranath ;
Ghosh, Abhik ;
Mandal, Abhijit ;
Martin, Nirian ;
Pardo, Leandro .
ELECTRONIC JOURNAL OF STATISTICS, 2017, 11 (02) :2741-2772
[4]  
Bianco A.M., 1996, ROBUST STAT DATA ANA, V109, P17, DOI [10.1007/978-1-4612-2380-1_2, DOI 10.1007/978-1-4612-2380-1_2]
[5]   Robust testing in the logistic regression model [J].
Bianco, Ana M. ;
Martinez, Elena .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (12) :4095-4105
[6]   Minimum distance estimation for the logistic regression model [J].
Bondell, HD .
BIOMETRIKA, 2005, 92 (03) :724-731
[7]   A characteristic function approach to the biased sampling model, with application to robust logistic regression [J].
Bondell, Howard D. .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2008, 138 (03) :742-755
[8]   Robust inference for generalized linear models [J].
Cantoni, E ;
Ronchetti, E .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2001, 96 (455) :1022-1030
[9]   Robust Parametric Classification and Variable Selection by a Minimum Distance Criterion [J].
Chi, Eric C. ;
Scott, David W. .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (01) :111-128
[10]   Implementing the Bianco and Yohai estimator for logistic regression [J].
Croux, C ;
Haesbroeck, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2003, 44 (1-2) :273-295