Model selection with bootstrap validation

被引:2
|
作者
Savvides, Rafael [1 ]
Makela, Jarmo [1 ]
Puolamaki, Kai [1 ,2 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
[2] Univ Helsinki, Inst Atmospher & Earth Syst Res, Helsinki, Finland
基金
芬兰科学院;
关键词
bootstrap; model selection; CROSS-VALIDATION;
D O I
10.1002/sam.11606
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model selection is one of the most central tasks in supervised learning. Validation set methods are the standard way to accomplish this task: models are trained on training data, and the model with the smallest loss on the validation data is selected. However, it is generally not obvious how much validation data is required to make a reliable selection, which is essential when labeled data are scarce or expensive. We propose a bootstrap-based algorithm, bootstrap validation (BSV), that uses the bootstrap to adjust the validation set size and to find the best-performing model within a tolerance parameter specified by the user. We find that BSV works well in practice and can be used as a drop-in replacement for validation set methods or k-fold cross-validation. The main advantage of BSV is that less validation data is typically needed, so more data can be used to train the model, resulting in better approximations and efficient use of validation data.
引用
收藏
页码:162 / 186
页数:25
相关论文
共 50 条
  • [41] The comparison study of the model selection criteria on the Tobit regression model based on the bootstrap sample augmentation mechanisms
    Su, Yue
    Mwanakatwe, P. K.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2021, 91 (07) : 1415 - 1440
  • [42] Pitfalls of hypothesis tests and model selection on bootstrap samples: Causes and consequences in biometrical applications
    Janitza, Silke
    Binder, Harald
    Boulesteix, Anne-Laure
    BIOMETRICAL JOURNAL, 2016, 58 (03) : 447 - 473
  • [43] Bootstrap selection of the smoothing parameter in density estimation under the Koziol-Green model
    de Una-Alvarez, J
    Gonzalez-Manteiga, W
    Cadarso-Suarez, C
    L(1)-STATISTICAL PROCEDURES AND RELATED TOPICS, 1997, 31 : 385 - 398
  • [44] A bootstrap recipe for post-model-selection inference under linear regression models
    Lee, S. M. S.
    Wu, Y.
    BIOMETRIKA, 2018, 105 (04) : 873 - 890
  • [45] Model selection by resampling penalization
    Arlot, Sylvain
    ELECTRONIC JOURNAL OF STATISTICS, 2009, 3 : 557 - 624
  • [46] Model validation and selection based on inverse fuzzy arithmetic
    Haag, Thomas
    Gonzalez, Sergio Carvajal
    Hanss, Michael
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2012, 32 : 116 - 134
  • [47] On Estimating Model in Feature Selection With Cross-Validation
    Qi, Chunxia
    Diao, Jiandong
    Qiu, Like
    IEEE ACCESS, 2019, 7 : 33454 - 33463
  • [48] GENERALIZED CROSS-VALIDATION FOR COVARIANCE MODEL SELECTION
    MARCOTTE, D
    MATHEMATICAL GEOLOGY, 1995, 27 (05): : 659 - 672
  • [49] Cross-validation for selecting a model selection procedure
    Zhang, Yongli
    Yang, Yuhong
    JOURNAL OF ECONOMETRICS, 2015, 187 (01) : 95 - 112
  • [50] Bootstrap validation of links of a minimum spanning tree
    Musciotto, F.
    Marotta, L.
    Micciche, S.
    Mantegna, R. N.
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2018, 512 : 1032 - 1043