Estimating the Probability of Rare Events Occurring Using a Local Model Averaging

被引:2
作者
Chen, Jin-Hua [1 ]
Chen, Chun-Shu [2 ]
Huang, Meng-Fan [2 ]
Lin, Hung-Chih [3 ,4 ]
机构
[1] Taipei Med Univ, Coll Management, Ctr Biostat, Master Program Big Data Technol & Management, Taipei, Taiwan
[2] Natl Changhua Univ Educ, Inst Stat & Informat Sci, Changhua, Taiwan
[3] China Med Univ, Children Hosp, Taichung, Taiwan
[4] China Med Univ, Sch Chinese Med, Taichung, Taiwan
关键词
Kullback-Leibler loss; logistic regression; maximum likelihood estimate; uncertainty; NECROTIZING ENTEROCOLITIS; SELECTION; RISK; REGRESSION;
D O I
10.1111/risa.12558
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
In statistical applications, logistic regression is a popular method for analyzing binary data accompanied by explanatory variables. But when one of the two outcomes is rare, the estimation of model parameters has been shown to be severely biased and hence estimating the probability of rare events occurring based on a logistic regression model would be inaccurate. In this article, we focus on estimating the probability of rare events occurring based on logistic regression models. Instead of selecting a best model, we propose a local model averaging procedure based on a data perturbation technique applied to different information criteria to obtain different probability estimates of rare events occurring. Then an approximately unbiased estimator of Kullback-Leibler loss is used to choose the best one among them. We design complete simulations to show the effectiveness of our approach. For illustration, a necrotizing enterocolitis (NEC) data set is analyzed.
引用
收藏
页码:1855 / 1870
页数:16
相关论文
共 40 条
[1]  
Agresti A, 2007, INTRO CATEGORICAL DA
[2]  
Akaike H., 1998, Selected papers of Hirotugu Akaike, P199, DOI [10.1007/978-1-4612-1694-0_15, DOI 10.1007/978-1-4612-1694-0_15]
[3]   Heat Waves in the United States: Mortality Risk during Heat Waves and Effect Modification by Heat Wave Characteristics in 43 U.S. Communities [J].
Anderson, G. Brooke ;
Bell, Michelle L. .
ENVIRONMENTAL HEALTH PERSPECTIVES, 2011, 119 (02) :210-218
[4]  
[Anonymous], 2002, Model selection and multimodel inference: a practical informationtheoretic approach
[5]  
[Anonymous], 2006, Model selection and model averaging, DOI DOI 10.1017/CBO9780511790485.003
[6]   Model uncertainty and risk estimation for experimental studies of quantal responses [J].
Bailer, AJ ;
Noble, RB ;
Wheeler, MW .
RISK ANALYSIS, 2005, 25 (02) :291-299
[7]  
Bolton RJ, 2002, STAT SCI, V17, P235
[8]  
Breiman L, 1996, ANN STAT, V24, P2350
[9]   A stabilized and versatile spatial prediction method for geostatistical models [J].
Chen, Chun-Shu ;
Yang, Hong-Ding ;
Li, Yang .
ENVIRONMETRICS, 2014, 25 (02) :127-141
[10]   Geostatistical model averaging based on conditional information criteria [J].
Chen, Chun-Shu ;
Huang, Hsin-Cheng .
ENVIRONMENTAL AND ECOLOGICAL STATISTICS, 2012, 19 (01) :23-35