Bayesian Model Selection via Composite Likelihood for High-dimensional Data Integration

被引:0
|
作者
Zhang, Guanlin [1 ]
Wu, Yuehua [1 ]
Gao, Xin [1 ]
机构
[1] York Univ, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Bayesian method; data integration; Gibbs sampling; model selection; sub-Gaussian; subexponential; union support recovery; VARIABLE SELECTION; CONSISTENCY; INFERENCE; PRIORS;
D O I
10.1002/cjs.11800
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider data integration problems where correlated data are collected from multiple platforms. Within each platform, there are linear relationships between the responses and a collection of predictors. We extend the linear models to include random errors coming from a much wider family of sub-Gaussian and subexponential distributions. The goal is to select important predictors across multiple platforms, where the number of predictors and the number of observations both increase to infinity. We combine the marginal densities of the responses obtained from different platforms to form a composite likelihood and propose a model selection criterion based on Bayesian composite posterior probabilities. Under some regularity conditions, we prove that the model selection criterion is consistent to recover the union support of the predictors with divergent true model size. Dans cette etude, la problematique de l'integration de donnees correlees collectees a partir de diverses plateformes est minutieusement examinee. Au sein de chaque plateforme, des relations lineaires sont identifiees entre les variables de reponse et un ensemble specifique de predicteurs. Pour enrichir l'analyse, les modeles lineaires sont generalises afin d'inclure des composantes d'erreur aleatoire issues d'une famille elargie de lois, telles que les distributions sous-gaussiennes et sous-exponentielles. L'objectif principal de l'etude est l'identification de predicteurs pertinents a travers plusieurs plateformes, une tache rendue plus complexe par l'augmentation indefinie du nombre de predicteurs et du volume d'observations. A cet effet, les auteurs de ce travail combinent les densites marginales des variables reponses provenant de differentes plateformes pour former une fonction de vraisemblance composite. Sur cette base, ils proposent un critere de selection de modele en s'appuyant sur des probabilites a posteriori composites dans un contexte bayesien. Enfin, sous des conditions de regularite specifiques, les auteurs demontrent que leur critere de selection de modele est convergent et permet de recuperer le support d'union des predicteurs, meme en presence d'une divergence dans la taille du modele veritable.
引用
收藏
页码:924 / 938
页数:15
相关论文
共 50 条
  • [31] Model Selection for High-Dimensional Quadratic Regression via Regularization
    Hao, Ning
    Feng, Yang
    Zhang, Hao Helen
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (522) : 615 - 625
  • [32] Sparse Bayesian variable selection in multinomial probit regression model with application to high-dimensional data classification
    Yang Aijun
    Jiang Xuejun
    Xiang Liming
    Lin Jinguan
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (12) : 6137 - 6150
  • [33] Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis
    Aijun Yang
    Xuejun Jiang
    Lianjie Shu
    Jinguan Lin
    Computational Statistics, 2017, 32 : 127 - 143
  • [34] Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis
    Yang, Aijun
    Jiang, Xuejun
    Shu, Lianjie
    Lin, Jinguan
    COMPUTATIONAL STATISTICS, 2017, 32 (01) : 127 - 143
  • [35] Bayesian high-dimensional screening via MCMC
    Shang, Zuofeng
    Li, Ping
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2014, 155 : 54 - 78
  • [36] High-Dimensional Data and the Bias Variance Tradeoff in Model Selection
    Menna, Eligo Workineh
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS & STATISTICS, 2024, 63 : 34 - 56
  • [37] Improved Model for Attribute Selection on High-Dimensional Economic Data
    Somol, Petr
    Pudil, Pavel
    Castek, Ondrej
    Pokorna, Jana
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON MANAGEMENT, LEADERSHIP AND GOVERNANCE (ICMLG 2014), 2014, : 276 - 285
  • [38] Model selection in high-dimensional noisy data: a simulation study
    Romeo, Giovanni
    Thoresen, Magne
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2019, 89 (11) : 2031 - 2050
  • [39] Bayesian growth curve model useful for high-dimensional longitudinal data
    Jana, Sayantee
    Balakrishnan, Narayanaswamy
    Hamid, Jemila S.
    JOURNAL OF APPLIED STATISTICS, 2019, 46 (05) : 814 - 834
  • [40] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75