Causal Query in Observational Data with Hidden Variables

被引:4
作者
Cheng, Debo [1 ]
Li, Jiuyong [1 ]
Liu, Lin [1 ]
Liu, Jixue [1 ]
Yu, Kui [2 ]
Le, Thuc Duy [1 ]
机构
[1] Univ South Australia, Sch Informat Technol & Math Sci, Mawson Lakes, SA 5095, Australia
[2] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Peoples R China
来源
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年 / 325卷
基金
美国国家科学基金会;
关键词
SELECTION; MARKOV; INFERENCE; MODELS;
D O I
10.3233/FAIA200390
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper discusses the problem of causal query in observational data with hidden variables, with the aim of seeking the change of an outcome when "manipulating" a variable while given a set of plausible confounding variables which affect the manipulated variable and the outcome. Such an "experiment on data" to estimate the causal effect of the manipulated variable is useful for validating an experiment design using historical data or for exploring confounders when studying a new relationship. However, existing data-driven methods for causal effect estimation face some major challenges, including poor scalability with high dimensional data, low estimation accuracy due to heuristics used by the global causal structure learning algorithms, and the assumption of causal sufficiency when hidden variables are inevitable in data. In this paper, we develop theorems for using local search to find a superset of the adjustment (or confounding) variables for causal effect estimation from observational data under a realistic pretreatment assumption. The theorems ensure that the unbiased estimate of causal effect is included in the set of causal effects estimated by the superset of adjustment variables. Based on the developed theorems, we propose a data-driven algorithm for causal query. Experiments show that the proposed algorithm is faster and produces better causal effect estimation than an existing data-driven causal effect estimation method with hidden variables. The causal effects estimated by the proposed algorithm are as accurate as those by the state-of-the-art methods using domain knowledge.
引用
收藏
页码:2551 / 2558
页数:8
相关论文
共 41 条
[1]  
Aliferis CF, 2010, J MACH LEARN RES, V11, P171
[2]  
Almond D, 2005, Q J ECON, V120, P1031, DOI 10.1162/003355305774268228
[3]  
[Anonymous], 2009, CAUSALITY, DOI [10.1017/CBO9780511803161, DOI 10.1017/CBO9780511803161]
[4]  
[Anonymous], 2013, PROC AISTATS
[5]   Approximate residual balancing: debiased inference of average treatment effects in high dimensions [J].
Athey, Susan ;
Imbens, Guido W. ;
Wager, Stefan .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2018, 80 (04) :597-623
[6]   Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm [J].
Buehlmann, P. ;
Kalisch, M. ;
Maathuis, M. H. .
BIOMETRIKA, 2010, 97 (02) :261-278
[7]  
Cai RC, 2019, ADV NEUR IN, V32
[8]   Causal gene identification using combinatorial V-structure search [J].
Cai, Ruichu ;
Zhang, Zhenjie ;
Hao, Zhifeng .
NEURAL NETWORKS, 2013, 43 :63-71
[9]   BART: BAYESIAN ADDITIVE REGRESSION TREES [J].
Chipman, Hugh A. ;
George, Edward I. ;
McCulloch, Robert E. .
ANNALS OF APPLIED STATISTICS, 2010, 4 (01) :266-298
[10]   Hydrological prediction in a non-stationary world [J].
Clarke, Robin T. .
HYDROLOGY AND EARTH SYSTEM SCIENCES, 2007, 11 (01) :408-414