A Note on Cross-Validation for Lasso Under Measurement Errors

被引:6
作者
Datta, Abhirup [1 ]
Zou, Hui [2 ]
机构
[1] Johns Hopkins Univ, Dept Biostat, 615 N Wolfe St,E3527, Baltimore, MD 21205 USA
[2] Univ Minnesota, Dept Stat, Minneapolis, MN USA
关键词
Cross-validation; Inconsistency; Lasso; Measurement errors; REGRESSION; SELECTION; CONSISTENCY; VARIABLES;
D O I
10.1080/00401706.2019.1668856
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Variants of the Lasso or -penalized regression have been proposed to accommodate for presence of measurement errors in the covariates. Theoretical guarantees of these estimates have been established for some oracle values of the regularization parameters which are not known in practice. Data-driven tuning such as cross-validation has not been studied when covariates contain measurement errors. We demonstrate that in the presence of error-in-covariates, even when using a Lasso-variant that adjusts for measurement error, application of naive leave-one-out cross-validation to select the tuning parameter can be problematic. We provide an example where such a practice leads to estimation inconsistency. We also prove that a simple correction to cross-validation procedure restores consistency. We also study the risk consistency of the two cross-validation procedures and offer guideline on the choice of cross-validation based on the measurement error distributions of the training and the prediction data. The theoretical findings are validated using simulated data. for this article are available online.
引用
收藏
页码:549 / 556
页数:8
相关论文
共 17 条
[1]   MEBoost: Variable selection in the presence of measurement error [J].
Brown, Ben ;
Weaver, Timothy ;
Wolfson, Julian .
STATISTICS IN MEDICINE, 2019, 38 (15) :2705-2718
[2]  
Chatterjee S., 2015, 150206291 ARXIV
[3]   COCOLASSO FOR HIGH-DIMENSIONAL ERROR-IN-VARIABLES REGRESSION [J].
Datta, Abhirup ;
Zou, Hui .
ANNALS OF STATISTICS, 2017, 45 (06) :2400-2426
[4]   THE RESTRICTED CONSISTENCY PROPERTY OF LEAVE-nv-OUT CROSS-VALIDATION FOR HIGH-DIMENSIONAL VARIABLE SELECTION [J].
Feng, Yang ;
Yu, Yi .
STATISTICA SINICA, 2019, 29 (03) :1607-1630
[5]  
Fuller W. A., 2009, MEASUREMENT ERROR MO
[6]   PREDICTION WHEN BOTH VARIABLES ARE SUBJECT TO ERROR, WITH APPLICATION TO EARTHQUAKE MAGNITUDES [J].
GANSE, RA ;
AMEMIYA, Y ;
FULLER, WA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1983, 78 (384) :761-765
[7]   RISK CONSISTENCY OF CROSS-VALIDATION WITH LASSO-TYPE PROCEDURES [J].
Homrighausen, Darren ;
McDonald, Daniel J. .
STATISTICA SINICA, 2017, 27 (03) :1017-1036
[8]   Leave-one-out cross-validation is risk consistent for lasso [J].
Homrighausen, Darren ;
McDonald, Daniel J. .
MACHINE LEARNING, 2014, 97 (1-2) :65-78
[9]   Structure of dietary measurement error: Results of the OPEN biomarker study [J].
Kipnis, V ;
Subar, AF ;
Midthune, D ;
Freedman, LS ;
Ballard-Barbash, R ;
Troiano, RP ;
Bingham, S ;
Schoeller, DA ;
Schatzkin, A ;
Carroll, RJ .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2003, 158 (01) :14-21
[10]   HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY [J].
Loh, Po-Ling ;
Wainwright, Martin J. .
ANNALS OF STATISTICS, 2012, 40 (03) :1637-1664