Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation

被引：0

作者：

Arlot, Sylvain ^{[1
]}

Lerasle, Matthieu ^{[2
]}

机构：

[1] Univ Paris Saclay, Univ Paris Sud, CNRS, Lab Math Orsay, F-91405 Orsay, France

[2] Univ Nice Sophia Antipolis, CNRS, LJAD, UMR 7351, F-06100 Nice, France

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2016年 / 17卷

关键词：

V-fold cross-validation; Monte-Carlo cross-validation; leave-one-out; leave-p-out; resampling penalties; density estimation; model selection; penalization; OPTIMAL-MODEL SELECTION; REGRESSION; INEQUALITIES;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper studies V-fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares loss of the selected estimator. We first prove a non-asymptotic oracle inequality for V-fold cross-validation and its bias-corrected version (V-fold penalization). In particular, this result implies that V-fold penalization is asymptotically optimal in the nonparametric case. Then, we compute the variance of V-fold cross-validation and related criteria, as well as the variance of key quantities for model selection performance. We show that these variances depend on V like 1 + 4/(V 1), at least in some particular cases, suggesting that the performance increases much from V - 2 to V - 5 or 10, and then is almost constant. Overall, this can explain the common advice to take V - 5 - at least in our setting and when the computational power is limited-, as supported by some simulation experiments. An oracle inequality and exact formulas for the variance are also proved for Monte-Carlo cross-validation, also known as repeated cross-validation, where the parameter V is replaced by the number B of random splits of the data.

引用

页数：50

共 17 条

[1] An empirical comparison of V-fold penalisation and cross-validation for model selection in distribution-free regression
Dhanjal, Charanpal
Baskiotis, Nicolas
Clemencon, Stephan
Usunier, Nicolas
PATTERN ANALYSIS AND APPLICATIONS, 2016, 19 (01) : 41 - 53
[2] ASYMPTOTICS FOR LEAST-SQUARES CROSS-VALIDATION BANDWIDTHS IN NONSMOOTH CASES
VANES, B
ANNALS OF STATISTICS, 1992, 20 (03): : 1647 - 1657
[3] OPTIMAL CROSS-VALIDATION IN DENSITY ESTIMATION WITH THE L2-LOSS
Celisse, Alain
ANNALS OF STATISTICS, 2014, 42 (05): : 1879 - 1910
[4] Fast exact leave-one-out cross-validation of sparse least-squares support vector machines
Cawley, GC
Talbot, NLC
NEURAL NETWORKS, 2004, 17 (10) : 1467 - 1475
[5] MODIFIED CROSS-VALIDATION IN DENSITY-ESTIMATION
STUTE, W
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1992, 30 (03) : 293 - 305
[6] A NOTE ON MODIFIED CROSS-VALIDATION IN DENSITY-ESTIMATION
FELUCH, W
KORONACKI, J
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 13 (02) : 143 - 151
[7] Cross-validation for comparing multiple density estimation procedures
Lian, Heng
STATISTICS & PROBABILITY LETTERS, 2009, 79 (01) : 112 - 115
[8] Nonparametric density estimation by exact leave-p-out cross-validation
Celisse, Alain
Robin, Stephane
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (05) : 2350 - 2368
[9] Error estimation based on variance analysis of k-fold cross-validation
Jiang, Gaoxia
Wang, Wenjian
PATTERN RECOGNITION, 2017, 69 : 94 - 106
[10] Nonparametric tilted density function estimation: A cross-validation criterion
Doosti, Hassan
Hall, Peter
Mateu, Jorge
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2018, 197 : 51 - 68

← 1 2 →