Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework

被引:15
作者
Shi, Chengchun [1 ]
Wang, Xiaoyu [2 ]
Luo, Shikai [3 ]
Zhu, Hongtu [4 ]
Ye, Jieping [5 ]
Song, Rui [6 ]
机构
[1] London Sch Econ & Polit Sci, London, England
[2] Chinese Acad Sci, Acad Math & Syst Sci, Key Lab Syst & Control, Beijing, Peoples R China
[3] ByteDance, Beijing, Peoples R China
[4] Univ N Carolina, Chapel Hill, NC 27515 USA
[5] Univ Michigan, Ann Arbor, MI 48109 USA
[6] North Carolina State Univ, Raleigh, NC USA
关键词
A/B testing; Causal inference; Online experiment; Online updating; Reinforcement learning; Sequential testing; TREATMENT REGIMES; INFERENCE;
D O I
10.1080/01621459.2022.2027776
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. Major challenges arise in online experiments of two-sided marketplace platforms (e.g., Uber) where there is only one unit that receives a sequence of treatments over time. In those experiments, the treatment at a given time impacts current outcome as well as future outcomes. The aim of this article is to introduce a reinforcement learning framework for carrying A/B testing in these experiments, while characterizing the long-term treatment effects. Our proposed testing procedure allows for sequential monitoring and online updating. It is generally applicable to a variety of treatment designs in different industries. In addition, we systematically investigate the theoretical properties (e.g., size and power) of our testing procedure. Finally, we apply our framework to both simulated data and a real-world data example obtained from a technological company to illustrate its advantage over the current practice. A Python implementation of our test is available at . for this article are available online.
引用
收藏
页码:2059 / 2071
页数:13
相关论文
共 64 条
[1]  
[Anonymous], 1999, GROUP SEQUENTIAL MET
[2]  
Bhandari Jalaj, 2018, C LEARNING THEORY, P1691
[3]  
Bojinov I., 2020, TIME SERIES EXPT CAU
[4]   Assessing Time-Varying Causal Effect Moderation in Mobile Health [J].
Boruvka, Audrey ;
Almirall, Daniel ;
Witkiewitz, Katie ;
Murphy, Susan A. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (523) :1112-1121
[5]  
Candes E, 2007, ANN STAT, V35, P2313, DOI 10.1214/009053606000001523
[6]   Inference for non-regular parameters in optimal dynamic treatment regimes [J].
Chakraborty, Bibhas ;
Murphy, Susan ;
Strecher, Victor .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2010, 19 (03) :317-343
[7]   MACHINE LEARNING IN ECONOMETRICS Double/Debiased/Neyman Machine Learning of Treatment Effects [J].
Chernozhukov, Victor ;
Chetverikov, Denis ;
Demirer, Mert ;
Duflo, Esther ;
Hansen, Christian ;
Newey, Whitney .
AMERICAN ECONOMIC REVIEW, 2017, 107 (05) :261-265
[8]  
Ertefaie A., 2014, ARXIV14060764
[9]  
Hanna Josiah P, 2017, 31 AAAI C ART INT
[10]  
Hao B., 2021, ARXIV PREPRINT ARXIV