Improved baselines for causal structure learning on interventional data

被引:0
作者
Robin Richter
Shankar Bhamidi
Sach Mukherjee
机构
[1] Deutsches Zentrum für Neurodegenerative Erkrankungen e.V. (DZNE),Statistics and Machine Learning
[2] University of North Carolina,Department of Statistics and Operations Research
[3] University of Cambridge,MRC Biostatistics Unit
来源
Statistics and Computing | 2023年 / 33卷
关键词
Causality; Causal structure learning; Interventional data; Transitively closed graphs; Gene regulatory networks; Null models;
D O I
暂无
中图分类号
学科分类号
摘要
Causal structure learning (CSL) refers to the estimation of causal graphs from data. Causal versions of tools such as ROC curves play a prominent role in empirical assessment of CSL methods and performance is often compared with “random” baselines (such as the diagonal in an ROC analysis). However, such baselines do not take account of constraints arising from the graph context and hence may represent a “low bar”. In this paper, motivated by examples in systems biology, we focus on assessment of CSL methods for multivariate data where part of the graph structure is known via interventional experiments. For this setting, we put forward a new class of baselines called graph-based predictors (GBPs). In contrast to the “random” baseline, GBPs leverage the known graph structure, exploiting simple graph properties to provide improved baselines against which to compare CSL methods. We discuss GBPs in general and provide a detailed study in the context of transitively closed graphs, introducing two conceptually simple baselines for this setting, the observed in-degree predictor (OIP) and the transitivity assuming predictor (TAP). While the former is straightforward to compute, for the latter we propose several simulation strategies. Moreover, we study and compare the proposed predictors theoretically, including a result showing that the OIP outperforms in expectation the “random” baseline on a subclass of latent network models featuring positive correlation among edge probabilities. Using both simulated and real biological data, we show that the proposed GBPs outperform random baselines in practice, often substantially. Some GBPs even outperform standard CSL methods (whilst being computationally cheap in practice). Our results provide a new way to assess CSL methods for interventional data.
引用
收藏
相关论文
共 145 条
[1]  
Artzy-Randrup Y(2005)Generating uniformly distributed random networks Phys. Rev. E 72 283-291
[2]  
Stone L(2004)Structure and evolution of transcriptional regulatory networks Curr. Opin. Struct. Biol. 14 489-522
[3]  
Babu MM(2011)A sequential importance sampling algorithm for generating random graphs with prescribed degrees Internet Math. 6 3-122
[4]  
Luscombe NM(2007)The phase transition in inhomogeneous random graphs Random Struct. Algorithms 31 442-447
[5]  
Aravind L(1989)Generating random spanning trees FOCS 89 21865-21877
[6]  
Gerstein M(2020)Differentiable causal discovery from interventional data Adv. Neural. Inf. Process. Syst. 33 3741-3782
[7]  
Teichmann SA(2014)Order-independent constraint-based causal structure learning J. Mach. Learn. Res. 15 294-321
[8]  
Blitzstein J(2012)Learning high-dimensional directed acyclic graphs with latent and selection variables Ann. Stat. 40 1853-1866.e17
[9]  
Diaconis P(2016)Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens Cell 167 75-174
[10]  
Bollobás B(2010)Community detection in graphs Phys. Rep. 486 315-355