An Experimental and Theoretical Comparison of Model Selection Methods

被引:0
作者
Michael Kearns
Yishay Mansour
Andrew Y. Ng
Dana Ron
机构
[1] AT&T Laboratories Research,Department of Computer Science
[2] Tel Aviv University,Department of Computer Science
[3] Carnegie Mellon University,Laboratory of Computer Science
[4] MIT,undefined
来源
Machine Learning | 1997年 / 27卷
关键词
model selection; complexity regularization; cross validation; minimum description length principle; structural risk minimization; vc dimension;
D O I
暂无
中图分类号
学科分类号
摘要
We investigate the problem of model selection in the setting of supervised learning of boolean functions from independent random examples. More precisely, we compare methods for finding a balance between the complexity of the hypothesis chosen and its observed error on a random training sample of limited size, when the goal is that of minimizing the resulting generalization error. We undertake a detailed comparison of three well-known model selection methods — a variation of Vapnik's Guaranteed Risk Minimization (GRM), an instance of Rissanen's Minimum Description Length Principle (MDL), and (hold-out) cross validation (CV). We introduce a general class of model selection methods (called penalty-based methods) that includes both GRM and MDL, and provide general methods for analyzing such rules. We provide both controlled experimental evidence and formal theorems to support the following conclusions:
引用
收藏
页码:7 / 50
页数:43
相关论文
共 50 条
  • [41] Comparison of Relative Fit Indices for Diagnostic Model Selection
    Sen, Sedat
    Bradshaw, Laine
    APPLIED PSYCHOLOGICAL MEASUREMENT, 2017, 41 (06) : 422 - 438
  • [42] COMPARISON OF VARIABLE SELECTION AND REGRESSION METHODS IN MULTIVARIATE CALIBRATION OF A PROCESS ANALYZER
    HEIKKA, R
    MINKKINEN, P
    TAAVITSAINEN, VM
    PROCESS CONTROL AND QUALITY, 1994, 6 (01) : 47 - 54
  • [43] Comparison of Variable Selection Methods for Forecasting from Short Time Series
    McGee, Monnie
    Yaffee, Robert A.
    2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 531 - 540
  • [44] A comparative study of model selection methods for nonlinear time series
    Nakamura, T
    Kilminster, D
    Judd, K
    INTERNATIONAL JOURNAL OF BIFURCATION AND CHAOS, 2004, 14 (03): : 1129 - 1146
  • [45] Metric-Based Methods for Adaptive Model Selection and Regularization
    Dale Schuurmans
    Finnegan Southey
    Machine Learning, 2002, 48 : 51 - 84
  • [46] Metric-based methods for adaptive model selection and regularization
    Schuurmans, D
    Southey, F
    MACHINE LEARNING, 2002, 48 (1-3) : 51 - 84
  • [47] Pitfalls of post-model-selection testing: experimental quantification
    Matei Demetrescu
    Uwe Hassler
    Vladimir Kuzin
    Empirical Economics, 2011, 40 : 359 - 372
  • [48] Algebraic model selection and experimental design in biological data science
    Dimitrova, Elena
    Hu, Jingzhen
    Liang, Qingzhong
    Stigler, Brandilyn
    Zhang, Anyu
    ADVANCES IN APPLIED MATHEMATICS, 2022, 133
  • [49] On Model Selection Curves
    Mueller, Samuel
    Welsh, Alan H.
    INTERNATIONAL STATISTICAL REVIEW, 2010, 78 (02) : 240 - 256
  • [50] Pitfalls of post-model-selection testing: experimental quantification
    Demetrescu, Matei
    Hassler, Uwe
    Kuzin, Vladimir
    EMPIRICAL ECONOMICS, 2011, 40 (02) : 359 - 372