Model selection with bootstrap validation

被引:2
|
作者
Savvides, Rafael [1 ]
Makela, Jarmo [1 ]
Puolamaki, Kai [1 ,2 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
[2] Univ Helsinki, Inst Atmospher & Earth Syst Res, Helsinki, Finland
基金
芬兰科学院;
关键词
bootstrap; model selection; CROSS-VALIDATION;
D O I
10.1002/sam.11606
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model selection is one of the most central tasks in supervised learning. Validation set methods are the standard way to accomplish this task: models are trained on training data, and the model with the smallest loss on the validation data is selected. However, it is generally not obvious how much validation data is required to make a reliable selection, which is essential when labeled data are scarce or expensive. We propose a bootstrap-based algorithm, bootstrap validation (BSV), that uses the bootstrap to adjust the validation set size and to find the best-performing model within a tolerance parameter specified by the user. We find that BSV works well in practice and can be used as a drop-in replacement for validation set methods or k-fold cross-validation. The main advantage of BSV is that less validation data is typically needed, so more data can be used to train the model, resulting in better approximations and efficient use of validation data.
引用
收藏
页码:162 / 186
页数:25
相关论文
共 50 条
  • [21] Model selection by predictive validation
    Kittler, J
    Messer, K
    Sadeghi, M
    PATTERN ANALYSIS AND APPLICATIONS, 2002, 5 (03) : 245 - 260
  • [22] A survey of cross-validation procedures for model selection
    Arlot, Sylvain
    Celisse, Alain
    STATISTICS SURVEYS, 2010, 4 : 40 - 79
  • [23] MODEL SELECTION VIA MULTIFOLD CROSS-VALIDATION
    ZHANG, P
    ANNALS OF STATISTICS, 1993, 21 (01) : 299 - 313
  • [24] Bootstrap Approximation of Model Selection Probabilities for Multimodel Inference Frameworks
    Dajles, Andres
    Cavanaugh, Joseph
    ENTROPY, 2024, 26 (07)
  • [25] Bootstrap estimate of Kullback-Leibler information for model selection
    Shibata, R
    STATISTICA SINICA, 1997, 7 (02) : 375 - 394
  • [26] Nonparametric curve estimation and bootstrap bandwidth selection
    Barbeito, Ines
    Cao, Ricardo
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2020, 12 (03)
  • [27] Empirical Comparison Between Cross-Validation and Mutation-Validation in Model Selection
    Yu, Jinyang
    Hamdan, Sami
    Sasse, Leonard
    Morrison, Abigail
    Patil, Kaustubh R.
    ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT II, IDA 2024, 2024, 14642 : 56 - 67
  • [28] Cross validation for model selection: A review with examples from ecology
    Yates, Luke A.
    Aandahl, Zach
    Richards, Shane A.
    Brook, Barry W.
    ECOLOGICAL MONOGRAPHS, 2023, 93 (01)
  • [29] Evaluation of BIC and Cross Validation for model selection on sequence segmentations
    Haiminen, Niina
    Mannila, Heikki
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2010, 4 (06) : 675 - 700
  • [30] LASSO order selection for sparse autoregression: a bootstrap approach
    Fenga, Livio
    Politis, Dimitris N.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2017, 87 (14) : 2668 - 2688