Robust adaptive LASSO in high-dimensional logistic regression

被引：1

作者：

Basu, Ayanendranath ^{[1
]}

Ghosh, Abhik ^{[1
]}

Jaenada, Maria ^{[2
]}

Pardo, Leandro ^{[2
]}

机构：

[1] Indian Stat Inst, Interdisciplinary Stat Res Unit, 203 BT Rd, Kolkata 700108, India

[2] Univ Complutense Madrid, Stat & OR, Plaza Ciencias 3, Madrid 28040, Spain

来源：

STATISTICAL METHODS AND APPLICATIONS | 2024年 / 33卷 / 05期

关键词：

Density power divergence; High-dimensional data; Logistic regression; Oracle properties; Variable selection; VARIABLE SELECTION; GENE SELECTION; SPARSE REGRESSION; CLASSIFICATION; CANCER; MICROARRAYS; LIKELIHOOD; ALGORITHM; MODELS;

D O I：

10.1007/s10260-024-00760-2

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Penalized logistic regression is extremely useful for binary classification with large number of covariates (higher than the sample size), having several real life applications, including genomic disease classification. However, the existing methods based on the likelihood loss function are sensitive to data contamination and other noise and, hence, robust methods are needed for stable and more accurate inference. In this paper, we propose a family of robust estimators for sparse logistic models utilizing the popular density power divergence based loss function and the general adaptively weighted LASSO penalties. We study the local robustness of the proposed estimators through its influence function and also derive its oracle properties and asymptotic distribution. With extensive empirical illustrations, we demonstrate the significantly improved performance of our proposed estimators over the existing ones with particular gain in robustness. Our proposal is finally applied to analyse four different real datasets for cancer classification, obtaining robust and accurate models, that simultaneously performs gene selection and patient classification.

引用

页码：1217 / 1249

页数：33

共 53 条

[1] A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification [J].

Algamal, Zakariya Yahya ;

Lee, Muhammad Hisyam .

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (03) :753-771

[2] Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification [J].

Algamal, Zakariya Yahya ;

Lee, Muhammad Hisyam .

EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (23) :9326-9332

[3] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].

Alon, U ;

Barkai, N ;

Notterman, DA ;

Gish, K ;

Ybarra, S ;

Mack, D ;

Levine, AJ .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750

[4] Robust and consistent variable selection in high-dimensional generalized linear models [J].

Avella-Medina, Marco ;

Ronchetti, Elvezio .

BIOMETRIKA, 2018, 105 (01) :31-44

[5] Influence functions for penalized M-estimators [J].

Avella-Medina, Marco .

BERNOULLI, 2017, 23 (4B) :3178-3196

[6] Generalized Wald-type tests based on minimum density power divergence estimators [J].

Basu, A. ;

Mandal, A. ;

Martin, N. ;

Pardo, L. .

STATISTICS, 2016, 50 (01) :1-26

[7] Robust tests for the equality of two normal means based on the density power divergence [J].

Basu, A. ;

Mandal, A. ;

Martin, N. ;

Pardo, L. .

METRIKA, 2015, 78 (05) :611-634

[8] Testing statistical hypotheses based on the density power divergence [J].

Basu, A. ;

Mandal, A. ;

Martin, N. ;

Pardo, L. .

ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2013, 65 (02) :319-348

[9] Robust and efficient estimation by minimising a density power divergence [J].

Basu, A ;

Harris, IR ;

Hjort, NL ;

Jones, MC .

BIOMETRIKA, 1998, 85 (03) :549-559

[10]

Basu A., 2011, The minimum distance approach. Monographs on Statistics and Applied Probability, DOI [10.1201/b10956, DOI 10.1201/B10956]

← 1 2 3 4 5 6 →