Trustworthy Experimentation Under Telemetry Loss

被引：6

作者：

Gupchup, Jayant ^{[1
]}

Hosseinkashi, Yasaman ^{[1
]}

Dmitriev, Pavel ^{[1
,2
]}

Schneider, Daniel ^{[1
]}

Cutler, Ross ^{[1
]}

Jefremov, Andrei ^{[1
]}

Ellis, Martin ^{[1
]}

机构：

[1] Microsoft, Redmond, WA 98052 USA

[2] Outreach Io, Seattle, WA USA

来源：

CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT | 2018年

关键词：

Online controlled experiments; AB testing; client experimentation; telemetry loss; data loss; experimentation trustworthiness;

D O I：

10.1145/3269206.3271747

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Failure to accurately measure the outcomes of an experiment can lead to bias and incorrect conclusions. Online controlled experiments (aka AB tests) are increasingly being used to make decisions to improve websites as well as mobile and desktop applications. We argue that loss of telemetry data (during upload or post-processing) can skew the results of experiments, leading to loss of statistical power and inaccurate or erroneous conclusions. By systematically investigating the causes of telemetry loss, we argue that it is not practical to entirely eliminate it. Consequently, experimentation systems need to be robust to its effects. Furthermore, we note that it is nontrivial to measure the absolute level of telemetry loss in an experimentation system. In this paper, we take a top-down approach towards solving this problem. We motivate the impact of loss qualitatively using experiments in real applications deployed at scale, and formalize the problem by presenting a theoretical breakdown of the bias introduced by loss. Based on this foundation, we present a general framework for quantitatively evaluating the impact of telemetry loss, and present two solutions to measure the absolute levels of loss. This framework is used by well-known applications at Microsoft, with millions of users and billions of sessions. These general principles can be adopted by any application to improve the overall trustworthiness of experimentation and data-driven decision making.

引用

页码：387 / 396

页数：10

共 23 条

[1]

Celli Fabio, 2016, P WORKSH COMP MOD PR

[2]

Chaiken Ronnie, 2008, P VLDB ENDOWMENT, V1, P2

[3]

Deng Alex, 2013, P C WEB SEARCH DAT

[4]

Dmitriev Pavel, 2017, P ACM KDD 18

[5]

Dmitriev Pavel, 2017, P ACM KDD 17

[6]

Fabijan Aleksander, 2017, P EUR C SOFTW ENG AD

[7]

Gray Jim, 1996, ACM SIGMOD RECORD, V25, P2

[8]

Imai Kosuke, 2009, J ROYAL STAT SOC C, V58, P1

[9]

Jiang Junchen, 2016, P ACM SIGCOMM 16

[10]

Keeter Scott, 2006, PUBLIC OPIN QUART, V70, P1

← 1 2 3 →