Approximate Leave-One-Out Cross Validation for Regression With l1 Regularizers

被引：0

作者：

Auddy, Arnab ^{[1
]}

Zou, Haolin ^{[2
]}

Rad, Kamiar Rahnama ^{[3
]}

Maleki, Arian ^{[2
]}

机构：

[1] Ohio State Univ, Dept Stat, Columbus, OH 43210 USA

[2] Columbia Univ, Dept Stat, New York, NY 10032 USA

[3] CUNY, Baruch Coll, New York, NY 10031 USA

来源：

IEEE TRANSACTIONS ON INFORMATION THEORY | 2024年 / 70卷 / 11期

基金：

美国国家科学基金会;

关键词：

Computational modeling; Signal to noise ratio; Perturbation methods; Measurement; Reviews; Linear regression; Computational efficiency; High dimensional statistics; empirical risk minimization; regularization; cross validation; approximate leave-one-out; elastic net; LASSO;

D O I：

10.1109/TIT.2024.3450002

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one- out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding approach to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for generalized linear models with differentiable regularizers. For problems involving non- differentiable regularizers, despite significant empirical evidence, the theoretical understanding of ALO's error remains unknown. In this paper, we present a novel theory for a wide class of problems in the generalized linear model family with non- differentiable regularizers. We bound the error |ALO - LO| in terms of intuitive metrics such as the size of leave-i-out perturbations in active sets, sample size n, number of features p and regularization parameters. As a consequence, for the l(1)-regularized problems, we show that |ALO - LO| ->(p ->infinity) 0 while n/p and signal-to-noise ratio (SNR) are bounded.

引用

页码：8040 / 8071

页数：32

共 54 条

[1] Adler R.J., 2007, RANDOM FIELDS GEOMET
[2] RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION
ALLEN, DM
[J]. TECHNOMETRICS, 1974, 16 (01) : 125 - 127
[3] [Anonymous], 2013, Advances in Neural Information Processing Systems
[4] Auddy A, 2023, Arxiv, DOI arXiv:2310.17629
[5] The LASSO Risk for Gaussian Matrices
Bayati, Mohsen
Montanari, Andrea
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (04) : 1997 - 2017
[6] Bayle P, 2020, Arxiv, DOI arXiv:2007.12671
[7] Beirami A, 2017, ADV NEUR IN, V30
[8] Out-of-sample error estimation for M-estimators with convex penalty
Bellec, Pierre C.
[J]. INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2023, 12 (04) : 2782 - 2817
[9] Bradic J, 2016, Arxiv, DOI arXiv:1507.08726
[10] Efficient approximate leave-one-out cross-validation for kernel logistic regression
Cawley, Gavin C.
Talbot, Nicola L. C.
[J]. MACHINE LEARNING, 2008, 71 (2-3) : 243 - 264

← 1 2 3 4 5 6 →