An Experimental and Theoretical Comparison of Model Selection Methods

被引：0

作者：

Michael Kearns

Yishay Mansour

Andrew Y. Ng

Dana Ron

机构：

[1] AT&T Laboratories Research,Department of Computer Science

[2] Tel Aviv University,Department of Computer Science

[3] Carnegie Mellon University,Laboratory of Computer Science

[4] MIT,undefined

来源：

Machine Learning | 1997年 / 27卷

关键词：

model selection; complexity regularization; cross validation; minimum description length principle; structural risk minimization; vc dimension;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

We investigate the problem of model selection in the setting of supervised learning of boolean functions from independent random examples. More precisely, we compare methods for finding a balance between the complexity of the hypothesis chosen and its observed error on a random training sample of limited size, when the goal is that of minimizing the resulting generalization error. We undertake a detailed comparison of three well-known model selection methods — a variation of Vapnik's Guaranteed Risk Minimization (GRM), an instance of Rissanen's Minimum Description Length Principle (MDL), and (hold-out) cross validation (CV). We introduce a general class of model selection methods (called penalty-based methods) that includes both GRM and MDL, and provide general methods for analyzing such rules. We provide both controlled experimental evidence and formal theorems to support the following conclusions:

引用

页码：7 / 50

页数：43

共 50 条

[41] Comparison of Relative Fit Indices for Diagnostic Model Selection
Sen, Sedat
Bradshaw, Laine
APPLIED PSYCHOLOGICAL MEASUREMENT, 2017, 41 (06) : 422 - 438
[42] COMPARISON OF VARIABLE SELECTION AND REGRESSION METHODS IN MULTIVARIATE CALIBRATION OF A PROCESS ANALYZER
HEIKKA, R
MINKKINEN, P
TAAVITSAINEN, VM
PROCESS CONTROL AND QUALITY, 1994, 6 (01) : 47 - 54
[43] Comparison of Variable Selection Methods for Forecasting from Short Time Series
McGee, Monnie
Yaffee, Robert A.
2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 531 - 540
[44] A comparative study of model selection methods for nonlinear time series
Nakamura, T
Kilminster, D
Judd, K
INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2004, 14 (03): : 1129 - 1146
[45] Metric-Based Methods for Adaptive Model Selection and Regularization
Dale Schuurmans
Finnegan Southey
Machine Learning, 2002, 48 : 51 - 84
[46] Metric-based methods for adaptive model selection and regularization
Schuurmans, D
Southey, F
MACHINE LEARNING, 2002, 48 (1-3) : 51 - 84
[47] Pitfalls of post-model-selection testing: experimental quantification
Matei Demetrescu
Uwe Hassler
Vladimir Kuzin
Empirical Economics, 2011, 40 : 359 - 372
[48] Algebraic model selection and experimental design in biological data science
Dimitrova, Elena
Hu, Jingzhen
Liang, Qingzhong
Stigler, Brandilyn
Zhang, Anyu
ADVANCES IN APPLIED MATHEMATICS, 2022, 133
[49] On Model Selection Curves
Mueller, Samuel
Welsh, Alan H.
INTERNATIONAL STATISTICAL REVIEW, 2010, 78 (02) : 240 - 256
[50] Pitfalls of post-model-selection testing: experimental quantification
Demetrescu, Matei
Hassler, Uwe
Kuzin, Vladimir
EMPIRICAL ECONOMICS, 2011, 40 (02) : 359 - 372

← 1 2 3 4 5 →