PenPC: A Two-Step Approach to Estimate the Skeletons of High-Dimensional Directed Acyclic Graphs

被引:23
作者
Ha, Min Jin [1 ]
Sun, Wei [2 ,4 ]
Xie, Jichun [3 ]
机构
[1] Univ Texas Houston, MD Anderson Canc Ctr, Dept Biostat, 1515 Holcombe Blvd, Houston, TX 77030 USA
[2] Univ N Carolina, Dept Biostat, Dept Genet, Chapel Hill, NC 27514 USA
[3] Duke Univ, Dept Biostat & Bioinformat, Durham, NC 27708 USA
[4] Fred Hutchinson Canc Res Ctr, Publ Hlth Sci Div, 1124 Columbia St, Seattle, WA 98104 USA
关键词
DAG; High dimensional; Log penalty; PC-algorithm; Penalized regression; Skeleton; PENALIZED LIKELIHOOD; MODEL SELECTION; NETWORKS;
D O I
10.1111/biom.12415
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Estimation of the skeleton of a directed acyclic graph (DAG) is of great importance for understanding the underlying DAG and causal effects can be assessed from the skeleton when the DAG is not identifiable. We propose a novel method named PenPC to estimate the skeleton of a high-dimensional DAG by a two-step approach. We first estimate the nonzero entries of a concentration matrix using penalized regression, and then fix the difference between the concentration matrix and the skeleton by evaluating a set of conditional independence hypotheses. For high-dimensional problems where the number of vertices p is in polynomial or exponential scale of sample size n, we study the asymptotic property of PenPC on two types of graphs: traditional random graphs where all the vertices have the same expected number of neighbors, and scale-free graphs where a few vertices may have a large number of neighbors. As illustrated by extensive simulations and applications on gene expression data of cancer patients, PenPC has higher sensitivity and specificity than the state-of-the-art method, the PC-stable algorithm.
引用
收藏
页码:146 / 155
页数:10
相关论文
共 31 条
[1]  
Anderson T. W., 1962, INTRO MULTIVARIATE S
[2]  
[Anonymous], 1996, OXFORD STAT SCI SERI
[3]  
[Anonymous], 2000, Causation, prediction, and search
[4]  
[Anonymous], 2009, CAUSALITY MODELS REA
[5]   Emergence of scaling in random networks [J].
Barabási, AL ;
Albert, R .
SCIENCE, 1999, 286 (5439) :509-512
[6]   Extended Bayesian information criteria for model selection with large model spaces [J].
Chen, Jiahua ;
Chen, Zehua .
BIOMETRIKA, 2008, 95 (03) :759-771
[7]  
Chen T., 2014, TECHNICAL REPORT
[8]  
Chickering D. M., 2003, Journal of Machine Learning Research, V3, P507, DOI 10.1162/153244303321897717
[9]  
Colombo D., 2012, ARXIV12113295
[10]  
ERDOS P, 1960, B INT STATIST INST, V38, P343