Improving defect prediction with deep forest

被引:76
作者
Zhou, Tianchi [1 ]
Sun, Xiaobing [1 ,4 ]
Xia, Xin [2 ]
Li, Bin [1 ]
Chen, Xiang [3 ]
机构
[1] Yangzhou Univ, Sch Informat Engn, Yangzhou, Jiangsu, Peoples R China
[2] Monash Univ, Fac Informat Technol, Melbourne, Vic, Australia
[3] Northwestern Polytech Univ, Sch Software, Xian, Shaanxi, Peoples R China
[4] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
关键词
Software defect prediction; Deep forest; Cascade strategy; Empirical evaluation; MODEL; FRAMEWORK;
D O I
10.1016/j.infsof.2019.07.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Software defect prediction is important to ensure the quality of software. Nowadays, many supervised learning techniques have been applied to identify defective instances (e.g., methods, classes, and modules). Objective: However, the performance of these supervised learning techniques are still far from satisfactory, and it will be important to design more advanced techniques to improve the performance of defect prediction models. Method: We propose a new deep forest model to build the defect prediction model (DPDF). This model can identify more important defect features by using a new cascade strategy, which transforms random forest classifiers into a layer-by-layer structure. This design takes full advantage of ensemble learning and deep learning. Results: We evaluate our approach on 25 open source projects from four public datasets (i.e., NASA, PROMISE, AEEEM and Relink). Experimental results show that our approach increases AUC value by 5% compared with the best traditional machine learning algorithms. Conclusion: The deep strategy in DPDF is effective for software defect prediction.
引用
收藏
页码:204 / 216
页数:13
相关论文
共 86 条
  • [1] On line prediction of surface defects in hot bar rolling based on Bayesian hierarchical modeling
    Agarwal, Kuldeep
    Shivpuri, Rajiv
    [J]. JOURNAL OF INTELLIGENT MANUFACTURING, 2015, 26 (04) : 785 - 800
  • [2] Is "Better Data" Better Than "Better Data Miners"? On the Benefits of Tuning SMOTE for Defect Prediction
    Agrawal, Amritanshu
    Menzies, Tim
    [J]. PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 1050 - 1061
  • [3] [Anonymous], 2013, Representation Learning: A Review and New Perspectives
  • [4] [Anonymous], P INT C SOFTW ENG SO
  • [5] [Anonymous], P 30 IEEE ACM INT C
  • [6] A feature dependent Naive Bayes approach and its application to the software defect prediction problem
    Arar, Omer Faruk
    Ayan, Kursat
    [J]. APPLIED SOFT COMPUTING, 2017, 59 : 197 - 209
  • [7] The use of the area under the roc curve in the evaluation of machine learning algorithms
    Bradley, AP
    [J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
  • [8] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [9] Cheadle Chris, 2003, Appl Bioinformatics, V2, P209
  • [10] Negative samples reduction in cross-company software defects prediction
    Chen, Lin
    Fang, Bin
    Shang, Zhaowei
    Tang, Yuanyan
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 62 : 67 - 77