Composite Likelihood Bayesian Information Criteria for Model Selection in High-Dimensional Data

被引:94
作者
Gao, Xin [1 ]
Song, Peter X. -K. [2 ]
机构
[1] York Univ, Dept Math & Stat, N York, ON M3J 1P3, Canada
[2] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA
关键词
Consistency; Model selection; Pseudo-likelihood; Variable selection; APPROXIMATIONS; INFERENCE;
D O I
10.1198/jasa.2010.tm09414
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
For high-dimensional data sets with complicated dependency structures, the full likelihood approach often leads to intractable computational complexity. This imposes difficulty on model selection, given that most traditionally used information criteria require evaluation of the full likelihood. We propose a composite likelihood version of the Bayes information criterion (BIC) and establish its consistency property for the selection of the true underlying marginal model. Our proposed BIC is shown to be selection-consistent under some mild regularity conditions, where the number of potential model parameters is allowed to increase to infinity at a certain rate of the sample size. Simulation studies demonstrate the empirical performance of this new BIC, especially for the scenario where the number of parameters increases with sample size. Technical proofs of our theoretical results are provided in the online supplemental materials.
引用
收藏
页码:1531 / 1540
页数:10
相关论文
共 34 条
[11]   Approximate likelihood methods for estimating local recombination rates [J].
Fearnhead, P ;
Donnelly, P .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2002, 64 :657-680
[12]   Pseudolikelihood modeling of multivariate outcomes in developmental toxicology [J].
Geys, H ;
Molenberghs, G ;
Ryan, LM .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1999, 94 (447) :734-745
[13]   Pseudo-likelihood inference for clustered binary data [J].
Geys, H ;
Molenberghs, G ;
Ryan, LM .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1997, 26 (11) :2743-2767
[14]   Composite conditional likelihood for sparse clustered data [J].
Hanfelt, JJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2004, 66 :259-273
[15]   ON THE CHOICE OF A MODEL TO FIT DATA FROM AN EXPONENTIAL FAMILY [J].
HAUGHTON, DMA .
ANNALS OF STATISTICS, 1988, 16 (01) :342-355
[16]   A composite likelihood approach to binary spatial data [J].
Heagerty, PJ ;
Lele, SR .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1998, 93 (443) :1099-1111
[17]   Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities [J].
Jiang, Wenxin .
ANNALS OF STATISTICS, 2007, 35 (04) :1487-1511
[18]   Bayesian information criteria and smoothing parameter selection in radial basis function networks [J].
Konishi, S ;
Ando, T ;
Imoto, S .
BIOMETRIKA, 2004, 91 (01) :27-43
[19]   Semiparametric normal transformation models for spatially correlated survival data [J].
Li, Yi ;
Lin, Xihong .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (474) :591-603
[20]   A composite likelihood approach to multivariate survival data [J].
Parner, ET .
SCANDINAVIAN JOURNAL OF STATISTICS, 2001, 28 (02) :295-302