Modeling and Ranking Flaky Tests at Apple

被引:38
作者
Kowalczyk, Emily [1 ]
Nair, Karan [1 ]
Gao, Zebao [1 ]
Silberstein, Leo [1 ]
Long, Teng [1 ]
Memon, Atif [1 ]
机构
[1] Apple Inc, Cupertino, CA 95014 USA
来源
2020 IEEE/ACM 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE (ICSE-SEIP) | 2020年
关键词
D O I
10.1145/3377813.3381370
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Test flakiness-inability to reliably repeat a test's Pass/Fail outcome-continues to be a significant problem in Industry, adversely impacting continuous integration and test pipelines. Completely eliminating flaky tests is not a realistic option as a significant fraction of system tests (typically non-hermetic) for services-based implementations exhibit some level of flakiness. In this paper, we view the flakiness of a test as a rankable value, which we quantify, track and assign a confidence. We develop two ways to model flakiness, capturing the randomness of test results via entropy, and the temporal variation via flipRate, and aggregating these over time. We have implemented our flakiness scoring service and discuss how its adoption has impacted test suites of two large services at Apple. We show how flakiness is distributed across the tests in these services, including typical score ranges and outliers. The flakiness scores are used to monitor and detect changes in flakiness trends. Evaluation results demonstrate near perfect accuracy in ranking, identification and alignment with human interpretation. The scores were used to identify 2 causes of flakiness in the dataset evaluated, which have been confirmed, and where fixes have been implemented or are underway. Our models reduced flakiness by 44% with less than 1% loss in fault detection.
引用
收藏
页码:110 / 119
页数:10
相关论文
共 18 条
[1]   DEFLAKER: Automatically Detecting Flaky Tests [J].
Bell, Jonathan ;
Legunsen, Owolabi ;
Hilton, Michael ;
Eloussi, Lamyaa ;
Yung, Tifany ;
Marinov, Darko .
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, :433-444
[2]  
Borg Markus, 2019, SIGSOFT SOFTW ENG NO, V43, P53, DOI [10.1145/3282517.3282540, DOI 10.1145/3282517.3282540]
[3]   Understanding Flaky Tests: The Developer's Perspective [J].
Eck, Moritz ;
Palomba, Fabio ;
Castelluccio, Marco ;
Bacchelli, Alberto .
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, :830-840
[4]   Techniques for Improving Regression Testing in Continuous Integration Development Environments [J].
Elbaum, Sebastian ;
Rothermel, Gregg ;
Penix, John .
22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, :235-245
[5]  
Farchi E., 2003, Proceedings International Parallel and Distributed Processing Symposium, DOI 10.1109/IPDPS.2003.1213511
[6]  
Gao Z., 2017, THESIS
[7]   Making System User Interactive Tests Repeatable: When and What Should we Control? [J].
Gao, Zebao ;
Liang, Yalan ;
Cohen, Myra B. ;
Memon, Atif M. ;
Wang, Zhen .
2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, :55-65
[8]   From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis [J].
Harman, Mark ;
O'Hearn, Peter .
2018 IEEE 18TH INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM), 2018, :1-23
[9]   Towards a Bayesian Network Model for Predicting Flaky Automated Tests [J].
King, Tariq M. ;
Santiago, Dionny ;
Phillips, Justin ;
Clarke, Peter J. .
2018 IEEE 18TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY COMPANION (QRS-C), 2018, :100-107
[10]   Root Causing Flaky Tests in a Large-Scale Industrial Setting [J].
Lam, Wing ;
Godefroid, Patrice ;
Nath, Suman ;
Santhiar, Anirudh ;
Thummalapenta, Suresh .
PROCEEDINGS OF THE 28TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS (ISSTA '19), 2019, :101-111