Non-stationary A/B Tests

被引:3
作者
Wu, Yuhang [1 ]
Zheng, Zeyu [1 ]
Zhang, Guangyu [2 ]
Zhang, Zuohua [2 ]
Wang, Chu [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Amazon, Seattle, WA USA
来源
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年
关键词
A/B test; non-stationarity; statistical inference; bias correction; variance reduction; central limit theorem;
D O I
10.1145/3534678.3539325
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A/B tests, also known as online controlled experiments, have been used at scale by data-driven enterprises to guide decisions and test innovative ideas. Meanwhile, nonstationarity, such as the time-of-day effect, can commonly arise in various business metrics. We show that inadequately addressing nonstationarity can cause A/B tests to be statistically inefficient or invalid, leading to wrong conclusions. To address these issues, we develop a new framework that provides appropriate modeling and adequate statistical analysis for nonstationary A/B tests. Without changing the infrastructure for any existing A/B test procedure, we propose a new estimator that views time as a continuous covariate to perform post stratification with a sample-dependent number of stratification levels. We prove central limit theorem in a natural limiting regime under nonstationarity, so that valid large-sample statistical inference is available. We show that the proposed estimator achieves the optimal asymptotic variance among all estimators. When the experiment design phase of an A/B test allows, we propose a new time-grouped randomization approach to make a better balance on treatment and control assignments in presence of time nonstationarity. A brief account of numerical experiments are conducted to illustrate the theoretical analysis.
引用
收藏
页码:2079 / 2089
页数:11
相关论文
共 15 条
  • [1] Asmussen S., 2007, STOCHASTIC SIMULATIO, Vfirst
  • [2] Cohen Peter L, 2020, ARXIV201209246
  • [3] Deng A., 2013, P 6 ACM INT C WEB SE, P123, DOI [10.1145/2433396.2433413, DOI 10.1145/2433396.2433413]
  • [4] Gupta S., 2019, ACM SIGKDD EXPLORATI, V21, P20, DOI DOI 10.1145/3331651.3331655
  • [5] On the role of the propensity score in efficient semiparametric estimation of average treatment effects
    Hahn, JY
    [J]. ECONOMETRICA, 1998, 66 (02) : 315 - 331
  • [6] TIME-UNIFORM, NONPARAMETRIC, NONASYMPTOTIC CONFIDENCE SEQUENCES
    Howard, Steven R.
    Ramdas, Aaditya
    McAuliffe, Jon
    Sekhon, Jasjeet
    [J]. ANNALS OF STATISTICS, 2021, 49 (02) : 1055 - 1080
  • [7] Always Valid Inference: Continuous Monitoring of A/B Tests
    Johari, Ramesh
    Koomen, Pete
    Pekelis, Leonid
    Walsh, David
    [J]. OPERATIONS RESEARCH, 2021, 70 (03) : 1806 - 1821
  • [8] Peeking at A/B Tests Why it matters, and what to do about it
    Johari, Ramesh
    Koomen, Pete
    Pekelis, Leonid
    Walsh, David
    [J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1517 - 1525
  • [9] Kohavi R, 2020, TRUSTWORTHY ONLINE CONTROLLED EXPERIMENTS: A PRACTICAL GUIDE TO A/B TESTING, P1, DOI 10.1017/9781108653985
  • [10] Kohavi R., 2017, ENCY MACHINE LEARNIN, VFirst, P922, DOI [10.1007/978-1-4899-7687-1, DOI 10.1007/978-1-4899-7687-1_891]