Non-stationary A/B Tests

被引：3

作者：

Wu, Yuhang ^{[1
]}

Zheng, Zeyu ^{[1
]}

Zhang, Guangyu ^{[2
]}

Zhang, Zuohua ^{[2
]}

Wang, Chu ^{[2
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

[2] Amazon, Seattle, WA USA

来源：

PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年

关键词：

A/B test; non-stationarity; statistical inference; bias correction; variance reduction; central limit theorem;

D O I：

10.1145/3534678.3539325

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A/B tests, also known as online controlled experiments, have been used at scale by data-driven enterprises to guide decisions and test innovative ideas. Meanwhile, nonstationarity, such as the time-of-day effect, can commonly arise in various business metrics. We show that inadequately addressing nonstationarity can cause A/B tests to be statistically inefficient or invalid, leading to wrong conclusions. To address these issues, we develop a new framework that provides appropriate modeling and adequate statistical analysis for nonstationary A/B tests. Without changing the infrastructure for any existing A/B test procedure, we propose a new estimator that views time as a continuous covariate to perform post stratification with a sample-dependent number of stratification levels. We prove central limit theorem in a natural limiting regime under nonstationarity, so that valid large-sample statistical inference is available. We show that the proposed estimator achieves the optimal asymptotic variance among all estimators. When the experiment design phase of an A/B test allows, we propose a new time-grouped randomization approach to make a better balance on treatment and control assignments in presence of time nonstationarity. A brief account of numerical experiments are conducted to illustrate the theoretical analysis.

引用

页码：2079 / 2089

页数：11

共 15 条

[1] Asmussen S., 2007, STOCHASTIC SIMULATIO, Vfirst
[2] Cohen Peter L, 2020, ARXIV201209246
[3] Deng A., 2013, P 6 ACM INT C WEB SE, P123, DOI [10.1145/2433396.2433413, DOI 10.1145/2433396.2433413]
[4] Gupta S., 2019, ACM SIGKDD EXPLORATI, V21, P20, DOI DOI 10.1145/3331651.3331655
[5] On the role of the propensity score in efficient semiparametric estimation of average treatment effects
Hahn, JY
[J]. ECONOMETRICA, 1998, 66 (02) : 315 - 331
[6] TIME-UNIFORM, NONPARAMETRIC, NONASYMPTOTIC CONFIDENCE SEQUENCES
Howard, Steven R.
Ramdas, Aaditya
McAuliffe, Jon
Sekhon, Jasjeet
[J]. ANNALS OF STATISTICS, 2021, 49 (02) : 1055 - 1080
[7] Always Valid Inference: Continuous Monitoring of A/B Tests
Johari, Ramesh
Koomen, Pete
Pekelis, Leonid
Walsh, David
[J]. OPERATIONS RESEARCH, 2021, 70 (03) : 1806 - 1821
[8] Peeking at A/B Tests Why it matters, and what to do about it
Johari, Ramesh
Koomen, Pete
Pekelis, Leonid
Walsh, David
[J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1517 - 1525
[9] Kohavi R, 2020, TRUSTWORTHY ONLINE CONTROLLED EXPERIMENTS: A PRACTICAL GUIDE TO A/B TESTING, P1, DOI 10.1017/9781108653985
[10] Kohavi R., 2017, ENCY MACHINE LEARNIN, VFirst, P922, DOI [10.1007/978-1-4899-7687-1, DOI 10.1007/978-1-4899-7687-1_891]

← 1 2 →