Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models

被引:0
作者
Owain Parry
Gregory M. Kapfhammer
Michael Hilton
Phil McMinn
机构
[1] University of Sheffield,
[2] Allegheny College,undefined
[3] Carnegie Mellon University,undefined
来源
Empirical Software Engineering | 2023年 / 28卷
关键词
Software testing; Flaky tests; Machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
A flaky test is a test case whose outcome changes without modification to the code of the test case or the program under test. These tests disrupt continuous integration, cause a loss of developer productivity, and limit the efficiency of testing. Many flaky test detection techniques are rerunning-based, meaning they require repeated test case executions at a considerable time cost, or are machine learning-based, and thus they are fast but offer only an approximate solution with variable detection performance. These two extremes leave developers with a stark choice. This paper introduces CANNIER, an approach for reducing the time cost of rerunning-based detection techniques by combining them with machine learning models. The empirical evaluation involving 89,668 test cases from 30 Python projects demonstrates that CANNIER can reduce the time cost of existing rerunning-based techniques by an order of magnitude while maintaining a detection performance that is significantly better than machine learning models alone. Furthermore, the comprehensive study extends existing work on machine learning-based detection and reveals a number of additional findings, including (1) the performance of machine learning models for detecting polluter test cases; (2) using the mean values of dynamic test case features from repeated measurements can slightly improve the detection performance of machine learning models; and (3) correlations between various test case features and the probability of the test case being flaky.
引用
收藏
相关论文
共 51 条
[1]  
Bertolino A(2021)Know your neighbor: fast static prediction of test flakiness IEEE Access 9 76119-76134
[2]  
Cruciani E(2001)Random forests Mach Learn 45 5-32
[3]  
Miranda B(2002)SMOTE: synthetic minority over-sampling technique J Artif Intell Res 16 321-357
[4]  
Verdecchia R(2020)The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation BMC Genomics 21 1471-2164
[5]  
Breiman L(2018)Smells in software test code: a survey of knowledge in industry and academia J Syst Softw 138 52-81
[6]  
Chawla NV(2006)Extremely randomized trees Mach Learn 63 3-42
[7]  
Bowyer KW(1991)Cyclomatic complexity density and software maintenance productivity Trans Softw Eng 17 1284-585
[8]  
Hall LO(1985)A fuzzy k-nearest neighbor algorithm Trans Syst Man Cybernet 15 580-5839
[9]  
Kegelmeyer WP(2020)From local explanations to global understanding with explainable AI for trees Nat Mach Intell 2 2522-74
[10]  
Chicco D(2021)A survey of flaky tests Trans Softw Eng Methodol 31 1-674