Model selection with bootstrap validation

被引:2
|
作者
Savvides, Rafael [1 ]
Makela, Jarmo [1 ]
Puolamaki, Kai [1 ,2 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
[2] Univ Helsinki, Inst Atmospher & Earth Syst Res, Helsinki, Finland
基金
芬兰科学院;
关键词
bootstrap; model selection; CROSS-VALIDATION;
D O I
10.1002/sam.11606
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model selection is one of the most central tasks in supervised learning. Validation set methods are the standard way to accomplish this task: models are trained on training data, and the model with the smallest loss on the validation data is selected. However, it is generally not obvious how much validation data is required to make a reliable selection, which is essential when labeled data are scarce or expensive. We propose a bootstrap-based algorithm, bootstrap validation (BSV), that uses the bootstrap to adjust the validation set size and to find the best-performing model within a tolerance parameter specified by the user. We find that BSV works well in practice and can be used as a drop-in replacement for validation set methods or k-fold cross-validation. The main advantage of BSV is that less validation data is typically needed, so more data can be used to train the model, resulting in better approximations and efficient use of validation data.
引用
收藏
页码:162 / 186
页数:25
相关论文
共 50 条
  • [31] Bootstrap lag selection in DSGE models with expectations correction
    Angelini, Giovanni
    ECONOMETRICS AND STATISTICS, 2020, 14 : 38 - 48
  • [32] Bootstrap autoregressive order selection
    Franke, Juergen
    Kreiss, Jens-Peter
    Moser, Martin
    STATISTICS & RISK MODELING, 2006, 24 (03) : 305 - 325
  • [33] Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods
    Borra, Simone
    Di Ciaccio, Agostino
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (12) : 2976 - 2989
  • [34] Bootstrap Bias Corrected Cross Validation Applied to Super Learning
    Mnich, Krzysztof
    Golinska, Agnieszka Kitlas
    Polewko-Klim, Aneta
    Rudnicki, Witold R.
    COMPUTATIONAL SCIENCE - ICCS 2020, PT III, 2020, 12139 : 550 - 563
  • [35] Bootstrap techniques for sensitivity analysis and model selection in building thermal performance analysis
    Tian, Wei
    Song, Jitian
    Li, Zhanyong
    de Wilde, Pieter
    APPLIED ENERGY, 2014, 135 : 320 - 328
  • [36] Model selection and assessment for classification using validation
    Jaworski, W
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, PT 1, PROCEEDINGS, 2005, 3641 : 481 - 490
  • [37] New technique for postsample model selection and validation
    Ashley, R
    JOURNAL OF ECONOMIC DYNAMICS & CONTROL, 1998, 22 (05) : 647 - 665
  • [38] Bootstrap order selection for SETAR models
    Fenga, Livio
    Politis, Dimitris N.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015, 85 (02) : 235 - 250
  • [39] Multiple predicting K-fold cross-validation for model selection
    Jung, Yoonsuh
    JOURNAL OF NONPARAMETRIC STATISTICS, 2018, 30 (01) : 197 - 215
  • [40] Bootstrap Feature Selection for Ensemble Classifiers
    Duangsoithong, Rakkrit
    Windeatt, Terry
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS, 2010, 6171 : 28 - 41