Improving the prediction of continuous integration build failures using deep learning

被引:24
作者
Saidani, Islem [1 ]
Ouni, Ali [1 ]
Mkaouer, Mohamed Wiem [2 ]
机构
[1] Univ Quebec, ETS Montreal, Montreal, PQ, Canada
[2] Rochester Inst Technol, Rochester, NY 14623 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Continuous integration; Build prediction; Travis CI; Genetic algorithm; Long short term memory; Machine learning; Hyper-parameters optimization; Concept drift; HYPER-PARAMETER OPTIMIZATION; NEURAL-NETWORKS; CLASSIFICATION; SEARCH; IMPACT; LSTM;
D O I
10.1007/s10515-021-00319-5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Continuous Integration (CI) aims at supporting developers in integrating code changes constantly and quickly through an automated build process. However, the build process is typically time and resource-consuming as running failed builds can take hours until discovering the breakage; which may cause disruptions in the development process and delays in the product release dates. Hence, preemptively detecting when a software state is most likely to trigger a failure during the build is of crucial importance for developers. Accurate build failures prediction techniques can cut the expenses of CI build cost by early predicting its potential failures. However, developing accurate prediction models is a challenging task as it requires learning long- and short-term dependencies in the historical CI build data as well as extensive feature engineering to derive informative features to learn from. In this paper, we introduce DL-CIBuild a novel approach that uses Long Short-Term Memory (LSTM)-based Recurrent Neural Networks (RNN) to construct prediction models for CI build outcome prediction. The problem is comprised of a single series of CI build outcomes and a model is required to learn from the series of past observations to predict the next CI build outcome in the sequence. In addition, we tailor Genetic Algorithm (GA) to tune the hyper-parameters for our LSTM model. We evaluate our approach and investigate the performance of both cross-project and online prediction scenarios on a benchmark of 91,330 CI builds from 10 large and long-lived software projects that use the Travis CI build system. The statistical analysis of the obtained results shows that the LSTM-based model outperforms traditional Machine Learning (ML) models with both online and cross-project validations. DL-CIBuild has shown also a less sensitivity to the training set size and an effective robustness to the concept drift. Additionally, by considering several Hyper-Parameter Optimization (HPO) methods as baseline for GA, we demonstrate that the latter performs the best
引用
收藏
页数:61
相关论文
共 105 条
  • [1] A Machine Learning Approach to Improve the Detection of CI Skip Commits
    Abdalkareem, Rabe
    Mujahid, Suhaib
    Shihab, Emad
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (12) : 2740 - 2754
  • [2] Is "Better Data" Better Than "Better Data Miners"? On the Benefits of Tuning SMOTE for Defect Prediction
    Agrawal, Amritanshu
    Menzies, Tim
    [J]. PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 1050 - 1061
  • [3] [Anonymous], 2020, REPLICATION PACKAGE
  • [4] [Anonymous], 2012, P 5 ACM WORKSH SEC A
  • [5] [Anonymous], 2015, UNDERSTANDING LSTM N
  • [6] A Practical Guide for Using Statistical Tests to Assess Randomized Algorithms in Software Engineering
    Arcuri, Andrea
    Briand, Lionel
    [J]. 2011 33RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2011, : 1 - 10
  • [7] A Time Series Analysis of TravisTorrent Builds: To Everything There is a Season
    Atchison, Abigail
    Berardi, Christina
    Best, Natalie
    Stevens, Elizabeth
    Linstead, Erik
    [J]. 2017 IEEE/ACM 14TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2017), 2017, : 463 - 466
  • [8] Athiwaratkun B, 2017, INT CONF ACOUST SPEE, P2482, DOI 10.1109/ICASSP.2017.7952603
  • [9] Oops, My Tests Broke the Build: An Explorative Analysis of Travis CI with GitHub
    Beller, Moritz
    Gousios, Georgios
    Zaidman, Andy
    [J]. 2017 IEEE/ACM 14TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2017), 2017, : 356 - 367
  • [10] TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration
    Beller, Moritz
    Gousios, Georgios
    Zaidman, Andy
    [J]. 2017 IEEE/ACM 14TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2017), 2017, : 447 - 450