MPCSL - A Modular Pipeline for Causal Structure Learning

被引:4
作者
Huegle, Johannes [1 ]
Hagedorn, Christopher [1 ]
Perscheid, Michael [1 ]
Plattner, Hasso [1 ]
机构
[1] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
来源
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2021年
关键词
Causal Structure Learning; Modular Pipeline; Evaluation Framework; Benchmarking; GENE REGULATORY NETWORKS; BAYESIAN NETWORKS; INFERENCE; COMPETITIONS; MODEL;
D O I
10.1145/3447548.3467082
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The examination of causal structures is crucial for data scientists in a variety of machine learning application scenarios. In recent years, the corresponding interest in methods of causal structure learning has led to a wide spectrum of independent implementations, each having specific accuracy characteristics and introducing implementation-specific overhead in the runtime. Hence, considering a selection of algorithms or different implementations in different programming languages utilizing different hardware setups becomes a tedious manual task with high setup costs. Consequently, a tool that enables to plug in existing methods from different libraries into a single system to compare and evaluate the results is substantial support for data scientists in their research efforts. In this work, we propose an architectural blueprint of a pipeline for causal structure learning and outline our reference implementation MPCSL that addresses the requirements towards platform independence and modularity while ensuring the comparability and reproducibility of experiments. Moreover, we demonstrate the capabilities of MPCSL within a case study, where we evaluate existing implementations of the well-known PC-Algorithm concerning their runtime performance characteristics.
引用
收藏
页码:3068 / 3076
页数:9
相关论文
共 49 条
  • [1] Computational inference of gene regulatory networks: Approaches, limitations and opportunities
    Banf, Michael
    Rhee, Seung Y.
    [J]. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS, 2017, 1860 (01): : 41 - 52
  • [2] Font Size Matters-Emotion and Attention in Cortical Responses to Written Words
    Bayer, Mareike
    Sommer, Werner
    Schacht, Annekathrin
    [J]. PLOS ONE, 2012, 7 (05):
  • [3] Chickering D. M., 2003, Journal of Machine Learning Research, V3, P507, DOI 10.1162/153244303321897717
  • [4] Chickering DM, 2004, J MACH LEARN RES, V5, P1287
  • [5] Colombo D, 2014, J MACH LEARN RES, V15, P3741
  • [6] Conrady S., 2015, Bayesian networks and BayesiaLab: A practical introduction for researchers
  • [7] OpenMP: An industry standard API for shared-memory programming
    Dagum, L
    Menon, R
    [J]. IEEE COMPUTATIONAL SCIENCE & ENGINEERING, 1998, 5 (01): : 46 - 55
  • [8] 50 Years of Data Science
    Donoho, David
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2017, 26 (04) : 745 - 766
  • [9] Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition
    Dorie, Vincent
    Hill, Jennifer
    Shalit, Uri
    Scott, Marc
    Cervone, Dan
    [J]. STATISTICAL SCIENCE, 2019, 34 (01) : 43 - 68
  • [10] Emmert-Streib Frank, 2012, Frontiers in Genetics, V3, P8, DOI [10.3389/fgene.2012.00313, 10.3389/fgene.2012.00008]