MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models

被引:0
|
作者
Gao, Erdun [1 ]
Ng, Ignavier [2 ]
Gong, Mingming [1 ]
Shen, Li [3 ]
Huang, Wei [1 ]
Liu, Tongliang [4 ]
Zhang, Kun [2 ,5 ]
Bondell, Howard [1 ]
机构
[1] Univ Melbourne, Parkville, Australia
[2] Carnegie Mellon Univ, Pittsburgh, PA USA
[3] JD Explore Acad, Beijing, Peoples R China
[4] Univ Sydney, Sydney, Australia
[5] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
基金
美国国家卫生研究院; 澳大利亚研究理事会;
关键词
BAYESIAN NETWORKS; EM ALGORITHM; IMPUTATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. However, such a two-step method may suffer from suboptimality, as the imputation algorithm may introduce bias for modeling the underlying data distribution. In this paper, we develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. Focusing mainly on the assumptions of ignorable missingness and the identifiable additive noise models (ANMs), MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework. In the E-step, in cases where computing the posterior distributions of parameters in closed-form is not feasible, Monte Carlo EM is leveraged to approximate the likelihood. In the M-step, MissDAG leverages the density transformation to model the noise distributions with simpler and specific formulations by virtue of the ANMs and uses a likelihood-based causal discovery algorithm with directed acyclic graph constraint. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Causal Discovery for Rolling Bearing Fault Under Missing Data: From the Perspective of Causal Effect and Information Flow
    Ding, Xu
    Wu, Hao
    Wang, Junlong
    Xu, Juan
    Xin, Miao
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [32] Justifying Additive Noise Model-Based Causal Discovery via Algorithmic Information Theory
    Janzing, Dominik
    Steudel, Bastian
    OPEN SYSTEMS & INFORMATION DYNAMICS, 2010, 17 (02): : 189 - 212
  • [33] Image Recovery with Data Missing in the Presence of Salt-and-Pepper Noise
    Liu, Hongqing
    Hou, Liming
    Luo, Zhen
    Zhou, Yi
    Jing, Xiaorong
    Truong, Trieu-Kien
    APPLIED SCIENCES-BASEL, 2019, 9 (07):
  • [34] ESTIMATION OF TIME-SERIES MODELS IN THE PRESENCE OF MISSING DATA
    DUNSMUIR, W
    ROBINSON, PM
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (375) : 560 - 568
  • [35] SEMIPARAMETRIC ESTIMATION OF MODELS FOR MEANS AND COVARIANCES IN THE PRESENCE OF MISSING DATA
    ROTNITZKY, A
    ROBINS, JM
    SCANDINAVIAN JOURNAL OF STATISTICS, 1995, 22 (03) : 323 - 333
  • [36] Variable selection for additive models with missing data via multiple imputation
    Yuta Shimazu
    Takayuki Yamaguchi
    Ibuki A. J. Hoshina
    Hidetoshi Matsui
    Behaviormetrika, 2025, 52 (1) : 163 - 178
  • [37] Assessing the Overall and Partial Causal Well-Specification of Nonlinear Additive Noise Models
    Schultheiss, Christoph
    Buhlmann, Peter
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 41
  • [38] Improved double-robust estimation in missing data and causal inference models
    Rotnitzky, Andrea
    Lei, Quanhong
    Sued, Mariela
    Robins, James M.
    BIOMETRIKA, 2012, 99 (02) : 439 - 456
  • [39] Directed Graphical Models and Causal Discovery for Zero-Inflated Data
    Yu, Shiqing
    Drton, Mathias
    Shojaie, Ali
    CONFERENCE ON CAUSAL LEARNING AND REASONING, VOL 213, 2023, 213 : 27 - 67
  • [40] Handling hybrid and missing data in constraint-based causal discovery to study the etiology of ADHD
    Sokolova E.
    von Rhein D.
    Naaijen J.
    Groot P.
    Claassen T.
    Buitelaar J.
    Heskes T.
    International Journal of Data Science and Analytics, 2017, 3 (2) : 105 - 119