Feature Selection using e-values

被引：0

作者：

Majumdar, Subhabrata ^{[1
,2
]}

Chatterjee, Snigdhansu ^{[1
]}

机构：

[1] Univ Minnesota Twin Cities, Sch Stat, Minneapolis, MN 55455 USA

[2] Splunk, San Francisco, CA 94107 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

基金：

美国国家科学基金会;

关键词：

VARIABLE SELECTION; MODEL; REGRESSION; BOOTSTRAP; DEPTH; DIMENSION; LASSO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the context of supervised parametric models, we introduce the concept of e-values. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. The e-values are applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure using e-values, providing consistency results. For a p-dimensional feature space, this procedure requires fitting only the full model and evaluating p + 1 models, as opposed to the traditional requirement of fitting and evaluating 2(p) models. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values method as a promising general alternative to existing model-specific methods of feature selection.

引用

页数：21

共 50 条

[1] Confidence and Discoveries with E-values
Vovk, Vladimir
Wang, Ruodu
STATISTICAL SCIENCE, 2023, 38 (02) : 329 - 354
[2] E-Values for Mendelian Randomization
Swanson, Sonja A.
VanderWeele, Tyler J.
EPIDEMIOLOGY, 2020, 31 (03) : E23 - E24
[3] E-VALUES: CALIBRATION, COMBINATION AND APPLICATIONS
Vovk, Vladimir
Wang, Ruodu
ANNALS OF STATISTICS, 2021, 49 (03): : 1736 - 1754
[4] E-values, Multiple Testing and Beyond
Li, Guanxun
Zhang, Xianyang
arXiv, 2023,
[5] Concerning the accuracy of MAST E-values
Bailey, TL
Gribskov, M
BIOINFORMATICS, 2000, 16 (05) : 488 - 489
[6] A note on e-values and multiple testing
Li, Guanxun
Zhang, Xianyang
BIOMETRIKA, 2024, 112 (01)
[7] Online multiple testing with e-values
Xu, Ziyu
Ramdas, Aaditya
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[8] Commentary: The value of E-values and why they are not enough
Fox, Matthew P.
Arah, Onyebuchi A.
Stuart, Elizabeth A.
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2020, 49 (05) : 1505 - 1506
[9] Re. E-values for Mendelian Randomization
Sjolander, Arvid
Gabriel, Erin E.
EPIDEMIOLOGY, 2024, 35 (01) : E2 - E2
[10] On E-values for tandem MS scoring schemes
Segal, Mark R.
BIOINFORMATICS, 2008, 24 (14) : 1652 - 1653

← 1 2 3 4 5 →