Model selection is one of the most central tasks in supervised learning. Validation set methods are the standard way to accomplish this task: models are trained on training data, and the model with the smallest loss on the validation data is selected. However, it is generally not obvious how much validation data is required to make a reliable selection, which is essential when labeled data are scarce or expensive. We propose a bootstrap-based algorithm, bootstrap validation (BSV), that uses the bootstrap to adjust the validation set size and to find the best-performing model within a tolerance parameter specified by the user. We find that BSV works well in practice and can be used as a drop-in replacement for validation set methods or k-fold cross-validation. The main advantage of BSV is that less validation data is typically needed, so more data can be used to train the model, resulting in better approximations and efficient use of validation data.
机构:
CNRS, Willow Project Team, Lab Informat, CNRS ENS INRIA UMR 8548,Ecole Normale Super, 23 Ave Italie, F-75214 Paris 13, FranceCNRS, Willow Project Team, Lab Informat, CNRS ENS INRIA UMR 8548,Ecole Normale Super, 23 Ave Italie, F-75214 Paris 13, France
Arlot, Sylvain
Celisse, Alain
论文数: 0引用数: 0
h-index: 0
机构:
Univ Lille 1, CNRS, UMR 8524, Lab Math Paul Painleve, F-59655 Villeneuve, FranceCNRS, Willow Project Team, Lab Informat, CNRS ENS INRIA UMR 8548,Ecole Normale Super, 23 Ave Italie, F-75214 Paris 13, France