Constructing effective personalized policies using counterfactual inference from biased data sets with many features

被引:5
作者
Atan, Onur [1 ]
Zame, William R. [1 ,2 ]
Feng, Qiaojun [3 ]
van der Schaar, Mihaela [1 ,4 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
[2] Univ Oxford, Nuffield Coll, Oxford, England
[3] Tsinghua Univ, Beijing, Peoples R China
[4] Univ Oxford, Oxford Man Inst, Oxford, England
关键词
Inferring counterfactuals; Identifying relevant features; Constructing personalized policies; FEATURE-SELECTION;
D O I
10.1007/s10994-018-5768-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel approach for constructing effective personalized policies when the observed data lacks counter-factual information, is biased and possesses many features. The approach is applicable in a wide variety of settings from healthcare to advertising to education to finance. These settings have in common that the decision maker can observe, for each previous instance, an array of features of the instance, the action taken in that instance, and the reward realizedbut not the rewards of actions that were not taken: the counterfactual information. Learning in such settings is made even more difficult because the observed data is typically biased by the existing policy (that generated the data) and because the array of features that might affect the reward in a particular instanceand hence should be taken into account in deciding on an action in each particular instanceis often vast. The approach presented here estimates propensity scores for the observed data, infers counterfactuals, identifies a (relatively small) number of features that are (most) relevant for each possible action and instance, and prescribes a policy to be followed. Comparison of the proposed algorithm against state-of-art algorithms on actual datasets demonstrates that the proposed algorithm achieves a significant improvement in performance.
引用
收藏
页码:945 / 970
页数:26
相关论文
共 40 条
[1]  
Aggarwal CC., 2014, Data classification. Algorithms and applications, DOI [10.1201/b17320, DOI 10.1201/B17320]
[2]  
Alaa AM, 2017, ADV NEUR IN, V30
[3]  
[Anonymous], 2015, ARXIV150401132
[4]  
Atan Onur, 2018, ARXIV180208679
[5]   Exploration-exploitation tradeoff using variance estimates in multi-armed bandits [J].
Audibert, Jean-Yves ;
Munos, Remi ;
Szepesvari, Csaba .
THEORETICAL COMPUTER SCIENCE, 2009, 410 (19) :1876-1902
[6]  
Beygelzimer A, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P129
[7]  
Bottou L, 2013, J MACH LEARN RES, V14, P3207
[8]  
Dud ik M, 2011, P 28 INT C INT C MAC, P1097, DOI DOI 10.5555/3104482.3104620
[9]  
Duda R. O., 2012, PATTERN CLASSIFICATI, DOI DOI 10.1007/978-3-319-57027-3_4
[10]  
Dy JG, 2004, J MACH LEARN RES, V5, P845