Improving the prediction of continuous integration build failures using deep learning

被引:24
作者
Saidani, Islem [1 ]
Ouni, Ali [1 ]
Mkaouer, Mohamed Wiem [2 ]
机构
[1] Univ Quebec, ETS Montreal, Montreal, PQ, Canada
[2] Rochester Inst Technol, Rochester, NY 14623 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Continuous integration; Build prediction; Travis CI; Genetic algorithm; Long short term memory; Machine learning; Hyper-parameters optimization; Concept drift; HYPER-PARAMETER OPTIMIZATION; NEURAL-NETWORKS; CLASSIFICATION; SEARCH; IMPACT; LSTM;
D O I
10.1007/s10515-021-00319-5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Continuous Integration (CI) aims at supporting developers in integrating code changes constantly and quickly through an automated build process. However, the build process is typically time and resource-consuming as running failed builds can take hours until discovering the breakage; which may cause disruptions in the development process and delays in the product release dates. Hence, preemptively detecting when a software state is most likely to trigger a failure during the build is of crucial importance for developers. Accurate build failures prediction techniques can cut the expenses of CI build cost by early predicting its potential failures. However, developing accurate prediction models is a challenging task as it requires learning long- and short-term dependencies in the historical CI build data as well as extensive feature engineering to derive informative features to learn from. In this paper, we introduce DL-CIBuild a novel approach that uses Long Short-Term Memory (LSTM)-based Recurrent Neural Networks (RNN) to construct prediction models for CI build outcome prediction. The problem is comprised of a single series of CI build outcomes and a model is required to learn from the series of past observations to predict the next CI build outcome in the sequence. In addition, we tailor Genetic Algorithm (GA) to tune the hyper-parameters for our LSTM model. We evaluate our approach and investigate the performance of both cross-project and online prediction scenarios on a benchmark of 91,330 CI builds from 10 large and long-lived software projects that use the Travis CI build system. The statistical analysis of the obtained results shows that the LSTM-based model outperforms traditional Machine Learning (ML) models with both online and cross-project validations. DL-CIBuild has shown also a less sensitivity to the training set size and an effective robustness to the concept drift. Additionally, by considering several Hyper-Parameter Optimization (HPO) methods as baseline for GA, we demonstrate that the latter performs the best
引用
收藏
页数:61
相关论文
共 105 条
  • [41] Framewise phoneme classification with bidirectional LSTM and other neural network architectures
    Graves, A
    Schmidhuber, J
    [J]. NEURAL NETWORKS, 2005, 18 (5-6) : 602 - 610
  • [42] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
  • [43] Graves A, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P273, DOI 10.1109/ASRU.2013.6707742
  • [44] The Tabu_Genetic Algorithm: A Novel Method for Hyper-Parameter Optimization of Learning Algorithms
    Guo, Baosu
    Hu, Jingwen
    Wu, Wenwen
    Peng, Qingjin
    Wu, Fenghe
    [J]. ELECTRONICS, 2019, 8 (05)
  • [45] Early and robust remaining useful life prediction of supercapacitors using BOHB optimized Deep Belief Network
    Haris, Muhammad
    Hasan, Muhammad Noman
    Qin, Shiyin
    [J]. APPLIED ENERGY, 2021, 286
  • [46] Search-Based Software Engineering: Trends, Techniques and Applications
    Harman, Mark
    Mansouri, S. Afshin
    Zhang, Yuanyuan
    [J]. ACM COMPUTING SURVEYS, 2012, 45 (01)
  • [47] Harman M, 2012, LECT NOTES COMPUT SC, V7007, P1
  • [48] Change-Aware Build Prediction Model for Stall Avoidance in Continuous Integration
    Hassan, Foyzul
    Wang, Xiaoyin
    [J]. 11TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT (ESEM 2017), 2017, : 157 - 162
  • [49] Hastie T, 2009, The Elements of Statistical learning: Data mining, inference, and Prediction
  • [50] Trade-Offs in Continuous Integration: Assurance, Security, and Flexibility
    Hilton, Michael
    Nelson, Nicholas
    Tunnell, Timothy
    Marinov, Darko
    Dig, Danny
    [J]. ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2017, : 197 - 207