Adaptive model selection and assessment for exponential family distributions

被引:32
作者
Shen, XT [1 ]
Huang, HC
Ye, J
机构
[1] Univ Minnesota, Sch Stat, Minneapolis, MN 55455 USA
[2] Acad Sinica, Inst Stat Sci, Taipei 115, Taiwan
[3] CUNY Bernard M Baruch Coll, Stan Ross Dept Accountancy, New York, NY 10010 USA
基金
美国国家科学基金会;
关键词
adaptive penalty; cross-validation; loss estimation; parametric and nonpararnetric regression; trees; variable selection;
D O I
10.1198/004017004000000338
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In many scientific and engineering problems, selecting the optimal model from a large pool of candidate models is important, particularly in data mining. In the literature, model assessment in the context of non-normal distributions has not yet received a lot of attention. Indeed, many existing model selection criteria such as the Bayes information criterion and C-p, may not be suitable for a situation in which the conditional mean and variance of the response are dependent, such as in generalized linear model regression. In this article we propose a new adaptive model selection criterion and construct an approximately unbiased Kullback-Leibler loss estimator for model assessment in the context of exponential family distributions. This permits comparing any arbitrary complex modeling procedures. Our proposal uses a concept called generalized degrees of freedom that generalizes the concept originally proposed for the normal distribution. The proposed procedure is implemented for the binomial and Poisson distributions and its small sample operating characteristics are examined via simulations. The usefulness of the method is demonstrated by an application to a study of the effect of air pollution on certain respiratory diseases. Numerical analyses support the utility of the methodology.
引用
收藏
页码:306 / 317
页数:12
相关论文
共 23 条
[1]  
Akaike H., 1973, Selected papers of hirotugu akaike, P267
[2]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[3]  
Burnham K.P., 2002, A Practical InformationTheoretic Approach, V2
[6]   MULTIVARIATE ADAPTIVE REGRESSION SPLINES [J].
FRIEDMAN, JH .
ANNALS OF STATISTICS, 1991, 19 (01) :1-67
[7]   Calibration and empirical Bayes variable selection [J].
George, EI ;
Foster, DP .
BIOMETRIKA, 2000, 87 (04) :731-747
[8]   Associations between outdoor air pollutants and hospitalization for respiratory diseases [J].
Hagen, JA ;
Nafstad, P ;
Skrondal, A ;
Bjorkly, S ;
Magnus, P .
EPIDEMIOLOGY, 2000, 11 (02) :136-140
[9]  
Lin XW, 2000, ANN STAT, V28, P1570
[10]  
Linhart H., 1986, MODEL SELECTION