MODEL FREE ESTIMATION OF GRAPHICAL MODEL USING GENE EXPRESSION DATA

被引:2
作者
Yang, Jenny [1 ]
Liu, Yang [2 ]
Liu, Yufeng [3 ]
Sun, Wei [4 ]
机构
[1] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27515 USA
[2] Wright State Univ, Dept Math & Stat, Dayton, OH 45435 USA
[3] Univ N Carolina, Lineberger Comprehens Canc Ctr, Carolina Ctr Genome Sci, Dept Stat & Operat Res,Dept Genet,Dept Biostat, Chapel Hill, NC 27515 USA
[4] Fred Hutchinson Canc Res Ctr, Publ Hlth Sci Div, 1124 Columbia St, Seattle, WA 98104 USA
关键词
Directed acyclic graphs; graphical models; model free; REGRESSION;
D O I
10.1214/20-AOAS1380
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Graphical model is a powerful and popular approach to study high-dimensional omic data, such as genome-wide gene expression data. Nonlinear relations between genes are widely documented. However, partly due to sparsity of data points in high-dimensional space (i.e., curse of dimensionality) and computational challenges, most available methods construct graphical models by testing linear relations. We propose to address this challenge by a two-step approach: first, use a model-free approach to prioritize the neighborhood of each gene; then, apply a nonparametric conditional independence testing method to refine such neighborhood estimation. Our method, named as "mofreds" (MOdel FRee Estimation of DAG Skeletons), seeks to estimate the skeleton of a directed acyclic graph (DAG) by this two-step approach. We studied the theoretical properties of mofreds and evaluated its performance in extensive simulation settings. We found mofreds has substantially better performance than the state-of-the art method which is designed to detect linear relations of Gaussian graphical models. We applied mofreds to analyze gene expression data of breast cancer patients from The Cancer Genome Atlas (TCGA). We found that it discovers nonlinear relationships among gene pairs that are missed by the Gaussian graphical model methods.
引用
收藏
页码:194 / 207
页数:14
相关论文
共 28 条
[1]  
Chickering D. M., 2003, Journal of Machine Learning Research, V3, P507, DOI 10.1162/153244303321897717
[2]  
Colombo D, 2014, J MACH LEARN RES, V15, P3741
[3]   SAVE: A method for dimension reduction and graphics in regression [J].
Cook, RD .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2000, 29 (9-10) :2109-2121
[4]  
de Campos CP, 2011, J MACH LEARN RES, V12, P663
[5]  
FOYGEL R, 2010, P 23 INT C NEUR INF, P604
[6]   PenPC: A Two-Step Approach to Estimate the Skeletons of High-Dimensional Directed Acyclic Graphs [J].
Ha, Min Jin ;
Sun, Wei ;
Xie, Jichun .
BIOMETRICS, 2016, 72 (01) :146-155
[7]  
Harris N, 2013, J MACH LEARN RES, V14, P3365
[8]   A NON-PARAMETRIC TEST OF INDEPENDENCE [J].
HOEFFDING, W .
ANNALS OF MATHEMATICAL STATISTICS, 1948, 19 (04) :546-557
[9]   Geometric Interpretation of Gene Coexpression Network Analysis [J].
Horvath, Steve ;
Dong, Jun .
PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (08)
[10]   COMPUTING DISTRIBUTION OF QUADRATIC FORMS IN NORMAL VARIABLES [J].
IMHOF, JP .
BIOMETRIKA, 1961, 48 (3-4) :419-&