机构:
Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USACarnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
Lei, Jing
[1
]
机构:
[1] Carnegie Mellon Univ, Dept Stat & Data Sci, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
Cross-validation;
Hypothesis testing;
Model selection;
Overfitting;
Tuning parameter selection;
TUNING PARAMETER SELECTION;
MODEL SELECTION;
LASSO;
D O I:
10.1080/01621459.2019.1672556
中图分类号:
O21 [概率论与数理统计];
C8 [统计学];
学科分类号:
020208 ;
070103 ;
0714 ;
摘要:
Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.
机构:
Univ Minnesota, Div Biostat & Hlth Data Sci, Minneapolis, MN 55414 USAUniv Minnesota, Div Biostat & Hlth Data Sci, Minneapolis, MN 55414 USA
Dai, Biyue
Breheny, Patrick
论文数: 0引用数: 0
h-index: 0
机构:
Univ Iowa, Dept Biostat, Iowa City, IA USA
Univ Minnesota, Div Biostat, Minneapolis, MN 55414 USAUniv Minnesota, Div Biostat & Hlth Data Sci, Minneapolis, MN 55414 USA
机构:
Stanford Univ, Dept Biomed Data Sci, Stanford, CA USA
Stanford Univ, Dept Biomed Data Sci, 735 Campus Dr,Apt 888A, Stanford, CA 94305 USAStanford Univ, Dept Biomed Data Sci, Stanford, CA USA
Sun, Min Woo
Tibshirani, Robert
论文数: 0引用数: 0
h-index: 0
机构:
Stanford Univ, Dept Biomed Data Sci, Stanford, CA USA
Stanford Univ, Dept Stat, Stanford, CA USAStanford Univ, Dept Biomed Data Sci, Stanford, CA USA
机构:
Hong Kong Univ Sci & Technol, Dept Math, Kowloon, Hong Kong, Peoples R ChinaHong Kong Univ Sci & Technol, Dept Math, Kowloon, Hong Kong, Peoples R China
Yu, Chi-Wai
Clarke, Bertrand
论文数: 0引用数: 0
h-index: 0
机构:
Univ Nebraska, Dept Stat, Lincoln, NE 68583 USAHong Kong Univ Sci & Technol, Dept Math, Kowloon, Hong Kong, Peoples R China
机构:
Univ Minnesota, Div Biostat & Hlth Data Sci, Minneapolis, MN 55414 USAUniv Minnesota, Div Biostat & Hlth Data Sci, Minneapolis, MN 55414 USA
Dai, Biyue
Breheny, Patrick
论文数: 0引用数: 0
h-index: 0
机构:
Univ Iowa, Dept Biostat, Iowa City, IA USA
Univ Minnesota, Div Biostat, Minneapolis, MN 55414 USAUniv Minnesota, Div Biostat & Hlth Data Sci, Minneapolis, MN 55414 USA
机构:
Stanford Univ, Dept Biomed Data Sci, Stanford, CA USA
Stanford Univ, Dept Biomed Data Sci, 735 Campus Dr,Apt 888A, Stanford, CA 94305 USAStanford Univ, Dept Biomed Data Sci, Stanford, CA USA
Sun, Min Woo
Tibshirani, Robert
论文数: 0引用数: 0
h-index: 0
机构:
Stanford Univ, Dept Biomed Data Sci, Stanford, CA USA
Stanford Univ, Dept Stat, Stanford, CA USAStanford Univ, Dept Biomed Data Sci, Stanford, CA USA
机构:
Hong Kong Univ Sci & Technol, Dept Math, Kowloon, Hong Kong, Peoples R ChinaHong Kong Univ Sci & Technol, Dept Math, Kowloon, Hong Kong, Peoples R China
Yu, Chi-Wai
Clarke, Bertrand
论文数: 0引用数: 0
h-index: 0
机构:
Univ Nebraska, Dept Stat, Lincoln, NE 68583 USAHong Kong Univ Sci & Technol, Dept Math, Kowloon, Hong Kong, Peoples R China