Robust adaptive LASSO in high-dimensional logistic regression
被引:1
作者:
Basu, Ayanendranath
论文数: 0引用数: 0
h-index: 0
机构:
Indian Stat Inst, Interdisciplinary Stat Res Unit, 203 BT Rd, Kolkata 700108, IndiaIndian Stat Inst, Interdisciplinary Stat Res Unit, 203 BT Rd, Kolkata 700108, India
Basu, Ayanendranath
[1
]
Ghosh, Abhik
论文数: 0引用数: 0
h-index: 0
机构:
Indian Stat Inst, Interdisciplinary Stat Res Unit, 203 BT Rd, Kolkata 700108, IndiaIndian Stat Inst, Interdisciplinary Stat Res Unit, 203 BT Rd, Kolkata 700108, India
Ghosh, Abhik
[1
]
Jaenada, Maria
论文数: 0引用数: 0
h-index: 0
机构:
Univ Complutense Madrid, Stat & OR, Plaza Ciencias 3, Madrid 28040, SpainIndian Stat Inst, Interdisciplinary Stat Res Unit, 203 BT Rd, Kolkata 700108, India
Jaenada, Maria
[2
]
Pardo, Leandro
论文数: 0引用数: 0
h-index: 0
机构:
Univ Complutense Madrid, Stat & OR, Plaza Ciencias 3, Madrid 28040, SpainIndian Stat Inst, Interdisciplinary Stat Res Unit, 203 BT Rd, Kolkata 700108, India
Pardo, Leandro
[2
]
机构:
[1] Indian Stat Inst, Interdisciplinary Stat Res Unit, 203 BT Rd, Kolkata 700108, India
[2] Univ Complutense Madrid, Stat & OR, Plaza Ciencias 3, Madrid 28040, Spain
Penalized logistic regression is extremely useful for binary classification with large number of covariates (higher than the sample size), having several real life applications, including genomic disease classification. However, the existing methods based on the likelihood loss function are sensitive to data contamination and other noise and, hence, robust methods are needed for stable and more accurate inference. In this paper, we propose a family of robust estimators for sparse logistic models utilizing the popular density power divergence based loss function and the general adaptively weighted LASSO penalties. We study the local robustness of the proposed estimators through its influence function and also derive its oracle properties and asymptotic distribution. With extensive empirical illustrations, we demonstrate the significantly improved performance of our proposed estimators over the existing ones with particular gain in robustness. Our proposal is finally applied to analyse four different real datasets for cancer classification, obtaining robust and accurate models, that simultaneously performs gene selection and patient classification.