Consistency;
exponential family;
extended Bayes information criterion;
feature selection;
generalized linear model;
small-n-large-P;
VARIABLE SELECTION;
MODEL SELECTION;
MULTIPLE;
REGULARIZATION;
CRITERION;
LOCI;
D O I:
10.5705/ss.2010.216
中图分类号:
O21 [概率论与数理统计];
C8 [统计学];
学科分类号:
020208 ;
070103 ;
0714 ;
摘要:
The small-n-large-P situation has become common in genetics research, medical studies, risk management, and other fields. Feature selection is crucial in these studies yet poses a serious challenge. The traditional criteria such as AIC, BIC, and cross-validation choose too many features. In this paper, we examine the variable selection problem under the generalized linear models. We study the approach where a prior takes specific account of the small-n-large-P situation. The criterion is shown to be variable selection consistent under generalized linear models. We also report simulation results and a data analysis to illustrate the effectiveness of EBIC for feature selection.