High dimensional semiparametric latent graphical model for mixed data

被引:71
作者
Fan, Jianqing [1 ]
Liu, Han [1 ]
Ning, Yang [1 ]
Zou, Hui [2 ]
机构
[1] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[2] Univ Minnesota, Minneapolis, MN USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Discrete data; Gaussian copula; Latent variable; Mixed data; Non-paranormal; Rank-based statistic; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; MATRIX ESTIMATION; ARABIDOPSIS-THALIANA; GENE NETWORK; SPARSE; MINIMIZATION; REGRESSION; PATHWAY; LASSO;
D O I
10.1111/rssb.12168
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a semiparametric latent Gaussian copula model for modelling mixed multivariate data, which contain a combination of both continuous and binary variables. The model assumes that the observed binary variables are obtained by dichotomizing latent variables that satisfy the Gaussian copula distribution. The goal is to infer the conditional independence relationship between the latent random variables, based on the observed mixed data. Our work has two main contributions: we propose a unified rank-based approach to estimate the correlation matrix of latent variables; we establish the concentration inequality of the proposed rank-based estimator. Consequently, our methods achieve the same rates of convergence for precision matrix estimation and graph recovery, as if the latent variables were observed. The methods proposed are numerically assessed through extensive simulation studies, and real data analysis.
引用
收藏
页码:405 / 421
页数:17
相关论文
共 55 条
  • [41] Ruiz-Sola MAguila., 2012, The Arabidopsis Book
  • [42] Latent variable modelling: A survey
    Skrondal, Anders
    Rabe-Hesketh, Sophia
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2007, 34 (04) : 712 - 745
  • [43] Treister Eran., 2014, ADV NEURAL INFORM PR, P927
  • [44] CALIBRATING NONCONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION
    Wang, Lan
    Kim, Yongdai
    Li, Runze
    [J]. ANNALS OF STATISTICS, 2013, 41 (05) : 2505 - 2536
  • [45] Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana -: art. no. R92
    Wille, A
    Zimmermann, P
    Vranová, E
    Fürholz, A
    Laule, O
    Bleuler, S
    Hennig, L
    Prelic, A
    von Rohr, P
    Thiele, L
    Zitzler, E
    Gruissem, W
    Bühlmann, P
    [J]. GENOME BIOLOGY, 2004, 5 (11)
  • [46] REGULARIZED RANK-BASED ESTIMATION OF HIGH-DIMENSIONAL NONPARANORMAL GRAPHICAL MODELS
    Xue, Lingzhou
    Zou, Hui
    [J]. ANNALS OF STATISTICS, 2012, 40 (05) : 2541 - 2571
  • [47] NONCONCAVE PENALIZED COMPOSITE CONDITIONAL LIKELIHOOD ESTIMATION OF SPARSE ISING MODELS
    Xue, Lingzhou
    Zou, Hui
    Cai, Tianxi
    [J]. ANNALS OF STATISTICS, 2012, 40 (03) : 1403 - 1429
  • [48] Yang E., 2014, P 17 INT C ART INT S
  • [49] Yang Z., 2014, ARXIV14128697 PRINC
  • [50] Model selection and estimation in the Gaussian graphical model
    Yuan, Ming
    Lin, Yi
    [J]. BIOMETRIKA, 2007, 94 (01) : 19 - 35