Combining linear regression models: When and how?

被引：159

作者：

Yuan, Z ^{[1
]}

Yang, YH

机构：

[1] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA

[2] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2005年 / 100卷 / 472期

基金：

美国国家科学基金会;

关键词：

adaptive regression by mixing; Bayesian model averaging; instability index; model combining; model selection; model uncertainty; perturbation instability in estimation;

D O I：

10.1198/016214505000000088

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Model-combining (i.e., mixing) methods have been proposed in recent years to deal with uncertainty in model selection. Even though advantages of model combining over model selection have been demonstrated in simulations and data examples, it is still unclear to a large extent when model combining should be preferred. In this work, first we propose an instability measure to capture the uncertainty of model selection in estimation, called perturbation instability in estimation (PIE), based on perturbation of the sample. We demonstrate that estimators from model selection can have large PIE values and that model combining substantially reduces the instability for such cases. Second, we propose a model combining method, adaptive regression by mixing with model screening (ARMS), and derive a theoretical property. In ARMS, a screening step is taken to narrow down the list of candidate models before combining, which not only saves computing time, but also can improve estimation accuracy. Third, we compare ARMS with EBMA (an empirical Bayesian model averaging) and model selection methods in a number of simulations and real data examples. The comparison shows that model combining produces better estimators when the instability of model selection is high and that ARMS performs better than EBMA in most such cases in our simulations. With respect to the choice between model selection and model combining, we propose a rule of thumb in terms of PIE. The empirical results support that PIE is a sensible indicator of model selection instability in estimation and is useful for understanding whether model combining is a better choice over model selection for the data at hand.

引用

页码：1202 / 1214

页数：13

共 37 条

[1] Akaike H, 1973, 2 INT S INFORM THEOR, P199, DOI 10.1007/978-1-4612-1694-0
[2] RELATIONSHIP BETWEEN VARIABLE SELECTION AND DATA AUGMENTATION AND A METHOD FOR PREDICTION
ALLEN, DM
[J]. TECHNOMETRICS, 1974, 16 (01) : 125 - 127
[3] [Anonymous], 2004, The Journal of Machine Learning Research, DOI [DOI 10.1162/153244304773936090, 10.1162/153244304773936090]
[4] [Anonymous], DETERRENCE INCAPACIT
[5] Risk bounds for model selection via penalization
Barron, A
Birgé, L
Massart, P
[J]. PROBABILITY THEORY AND RELATED FIELDS, 1999, 113 (03) : 301 - 413
[6] The minimum description length principle in coding and modeling
Barron, A
Rissanen, J
Yu, B
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (06) : 2743 - 2760
[7] Bagging predictors
Breiman, L
[J]. MACHINE LEARNING, 1996, 24 (02) : 123 - 140
[8] BRIEMAN L, 1996, ANN STAT, V24, P2350
[9] Model selection: An integral part of inference
Buckland, ST
Burnham, KP
Augustin, NH
[J]. BIOMETRICS, 1997, 53 (02) : 603 - 618
[10] MODEL UNCERTAINTY, DATA MINING AND STATISTICAL-INFERENCE
CHATFIELD, C
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1995, 158 : 419 - 466

← 1 2 3 4 →