SEMdag: Fast learning of Directed Acyclic Graphs via node or layer ordering

被引:0
作者
Grassi, Mario [1 ]
Tarantino, Barbara [1 ]
机构
[1] Univ Pavia, Dept Brain & Behav Sci, Pavia, Italy
来源
PLOS ONE | 2025年 / 20卷 / 01期
关键词
BAYESIAN NETWORK STRUCTURE; CAUSAL DISCOVERY; PREDICTION; LIKELIHOOD; MODELS;
D O I
10.1371/journal.pone.0317283
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A Directed Acyclic Graph (DAG) offers an easy approach to define causal structures among gathered nodes: causal linkages are represented by arrows between the variables, leading from cause to effect. Recently, industry and academics have paid close attention to DAG structure learning from observable data, and many techniques have been put out to address the problem. We provide a two-step approach, named SEMdag(), that can be used to quickly learn high-dimensional linear SEMs. It is included in the R package SEMgraph and employs a two-stage order-based search using previous knowledge (Knowledge-based, KB) or data-driven method (Bottom-up, BU), under the premise that a linear SEM with equal variance error terms is assumed. We evaluated our framework's for finding plausible DAGs against six well-known causal discovery techniques (ARGES, GES, PC, LiNGAM, CAM, NOTEARS). We conducted a series of experiments using observed expression (or RNA-seq) data, taking into account a pair of training and testing datasets for four distinct diseases: Amyotrophic Lateral Sclerosis (ALS), Breast cancer (BRCA), Coronavirus disease (COVID-19) and ST-elevation myocardial infarction (STEMI). The results show that the SEMdag() procedure can recover a graph structure with good disease prediction performance evaluated by a conventional supervised learning algorithm (RF): in the scenario where the initial graph is sparse, the BU approach may be a better choice than the KB one; in the case where the graph is denser, both BU an KB report high performance, with highest score for KB approach based on topological layers. Besides its superior disease predictive performance compared to previous research, SEMdag() offers the user the flexibility to define distinct structure learning algorithms and can handle high dimensional issues with less computing load. SEMdag() function is implemented in the R package SEMgraph, easily available at https://CRAN.R-project.org/package=SEMgraph.
引用
收藏
页数:24
相关论文
共 63 条
[1]  
Andersson S., 2000, Annals of Statistics, V25
[2]   Evaluating topological ordering in directed acyclic graphs [J].
Antunovic, Suzana ;
Vukicevic, Damir .
ELECTRONIC JOURNAL OF GRAPH THEORY AND APPLICATIONS, 2021, 9 (02) :567-580
[3]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[4]  
Bello K, 2022, ADV NEUR IN
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Clinical Value of RNA Sequencing-Based Classifiers for Prediction of the Five Conventional Breast Cancer Biomarkers: A Report From the Population-Based Multicenter Sweden Cancerome Analysis Network-Breast Initiative [J].
Brueffer, Christian ;
Vallon-Christersson, Johan ;
Grabau, Dorthe ;
Ehinger, Anna ;
Hakkinen, Jari ;
Hegardt, Cecilia ;
Malina, Janne ;
Chen, Yilun ;
Bendahl, Par-Ola ;
Manjer, Jonas ;
Malmberg, Martin ;
Larsson, Christer ;
Loman, Niklas ;
Ryden, Lisa ;
Borg, Ake ;
Saal, Lao H. .
JCO PRECISION ONCOLOGY, 2018, 2 :1-18
[8]   CAM: CAUSAL ADDITIVE MODELS, HIGH-DIMENSIONAL ORDER SEARCH AND PENALIZED REGRESSION [J].
Buehlmann, Peter ;
Peters, Jonas ;
Ernest, Jan .
ANNALS OF STATISTICS, 2014, 42 (06) :2526-2556
[9]   On causal discovery with an equal-variance assumption [J].
Chen, Wenyu ;
Drton, Mathias ;
Wang, Y. Samuel .
BIOMETRIKA, 2019, 106 (04) :973-980
[10]   The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation [J].
Chicco, Davide ;
Jurman, Giuseppe .
BMC GENOMICS, 2020, 21 (01)