Empirically Evaluating Readily Available Information for Regression Test Optimization in Continuous Integration

被引:39
作者
Elsner, Daniel [1 ]
Hauer, Florian [1 ]
Pretschner, Alexander [1 ]
Reimer, Silke [2 ]
机构
[1] Tech Univ Munich, Munich, Germany
[2] IVU Traff Technol, Berlin, Germany
来源
ISSTA '21: PROCEEDINGS OF THE 30TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS | 2021年
关键词
software testing; regression test optimization; machine learning; SELECTION; PRIORITIZATION;
D O I
10.1145/3460319.3464834
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Regression test selection (RTS) and prioritization (RTP) techniques aim to reduce testing efforts and developer feedback time after a change to the code base. Using various information sources, including test traces, build dependencies, version control data, and test histories, they have been shown to be effective. However, not all of these sources are guaranteed to be available and accessible for arbitrary continuous integration (CI) environments. In contrast, metadata from version control systems (VCSs) and CI systems are readily available and inexpensive. Yet, corresponding RTP and RTS techniques are scattered across research and often only evaluated on synthetic faults or in a specific industrial context. It is cumbersome for practitioners to identify insights that apply to their context, let alone to calibrate associated parameters for maximum cost-effectiveness. This paper consolidates existing work on RTP and unsafe RTS into an actionable methodology to build and evaluate such approaches that exclusively rely on CI and VCS metadata. To investigate how these approaches from prior research compare in heterogeneous settings, we apply the methodology in a large-scale empirical study on a set of 23 projects covering 37,000 CI logs and 76,000 VCS commits. We find that these approaches significantly outperform established RTP baselines and, while still triggering 90% of the failures, we show that practitioners can expect to save on average 84% of test execution time for unsafe RTS. We also find that it can be beneficial to limit training data, features from test history work better than change-based features, and, somewhat surprisingly, simple and well-known heuristics often outperform complex machine-learned models.
引用
收藏
页码:491 / 504
页数:14
相关论文
共 83 条
[1]   Selective Regression Testing based on Big Data: Comparing Feature Extraction Techniques [J].
Al-Sabbagh, Khaled Walid ;
Staron, Miroslaw ;
Ochodek, Miroslaw ;
Hebig, Regina ;
Meding, Wilhelm .
2020 IEEE 13TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW), 2020, :322-329
[2]  
Al-Sabbagh Khaled Walid, 2019, JOINT P INT WORKSH S, V2476
[3]  
Anderson J., 2014, P 11 WORK C MIN SOFT, P142, DOI DOI 10.1145/2597073.2597084
[4]   Striving for Failure: An Industrial Case Study About Test Failure Prediction [J].
Anderson, Jeff ;
Salem, Saeed ;
Do, Hyunsook .
2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2, 2015, :49-58
[5]  
[Anonymous], 2015, INT S SOFTWARE TESTI, DOI DOI 10.1145/2771783.2771784
[6]  
[Anonymous], 2007, P 2007 INT S SOFTW T
[7]  
[Anonymous], 2013, P ESECFSE 2013, DOI DOI 10.1145/2491411.2491436
[8]   ReTEST: A Cost Effective Test Case Selection Technique for Modern Software Development [J].
Azizi, Maral ;
Do, Hyunsook .
2018 29TH IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2018, :144-154
[9]   TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration [J].
Beller, Moritz ;
Gousios, Georgios ;
Zaidman, Andy .
2017 IEEE/ACM 14TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2017), 2017, :447-450
[10]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281