DeepCPDP: Deep Learning Based Cross-Project Defect Prediction

被引:27
|
作者
Chen, Deyu [1 ]
Chen, Xiang [1 ]
Li, Hao [2 ]
Xie, Junfeng [3 ]
Mu, Yanzhou [2 ]
机构
[1] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
[2] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300072, Peoples R China
[3] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
来源
IEEE ACCESS | 2019年 / 7卷
基金
中国国家自然科学基金;
关键词
Software defect prediction; cross-project defect prediction; bi-directional long short-term memory; embedding method; attention mechanism; FEATURE-SELECTION; MODEL; FRAMEWORK;
D O I
10.1109/ACCESS.2019.2961129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-project defect prediction (CPDP) is an active research topic in the domain of software defect prediction, since CPDP can be applied to the following scenarios: the target project for software defect prediction is a new project or the target project does not have enough labeled modules. Most of the previous work tried to utilize the labeled dataset gathered from other projects (i.e., the source projects) and then proposed transfer learning based methods to reduce the data distribution difference between different projects. In this article, we propose a deep learning based CPDP method DeepCPDP. For this method, we represent source code of each extracted program module by using simplified abstract syntax tree (SimAST). For a node of SimAST, we only keep its node type, since this is project-independent, while we ignore the name of method and variable, since these information are project-specific. Therefore, SimAST is project-independent and especially suitable for the task of CPDP. Then, we extract the token vector from each module after it is modeled via SimAST. Moreover, we design a new unsupervised based embedding method SimASTToken2Vec to learn meaningful representation for these extracted token vectors. Later, we employ Bi-directional Long Short-Term Memory (BiLSTM) neural network to automatically learn semantic features from embedded token vectors. In addition, we use attention mechanism over the BiLSTM layer to learn the weight of the vectors from the learned semantic features. Finally, we construct CPDP models via Logistic regression classifier. To show the effectiveness of DeepCPDP, ten large-scale projects from different application domains are used and AUC measure is used to measure the prediction performance of trained models. By using Scott-Knott test, we can find DeepCPDP can significantly outperform eight state-of-the-art baselines. Moreover, we also verify that the usage of SimASTToken2Vec, BiLSTM and attention mechanism is competitive in our proposed method.
引用
收藏
页码:184832 / 184848
页数:17
相关论文
共 50 条
  • [31] ALTRA: Cross-Project Software Defect Prediction via Active Learning and Tradaboost
    Yuan, Zhidan
    Chen, Xiang
    Cui, Zhanqi
    Mu, Yanzhou
    IEEE ACCESS, 2020, 8 : 30037 - 30049
  • [32] CFPS: Collaborative filtering based source projects selection for cross-project defect prediction
    Sun, Zhongbin
    Li, Junqi
    Sun, Heli
    He, Liang
    APPLIED SOFT COMPUTING, 2021, 99
  • [33] Cross-Project Transfer Learning on Lightweight Code Semantic Graphs for Defect Prediction
    Fang, Dingbang
    Liu, Shaoying
    Li, Yang
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (07) : 1095 - 1117
  • [34] Discriminant Subspace Alignment for Cross-project Defect Prediction
    Li, Zhiqiang
    Qi, Chao
    Zhang, Li
    Ren, Jie
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 1728 - 1733
  • [35] Using active learning selection approach for cross-project software defect prediction
    Mi, Wenbo
    Li, Yong
    Wen, Ming
    Chen, Youren
    CONNECTION SCIENCE, 2022, 34 (01) : 1482 - 1499
  • [36] Adversarial domain adaptation for cross-project defect prediction
    Hengjie Song
    Guobin Wu
    Le Ma
    Yufei Pan
    Qingan Huang
    Siyu Jiang
    Empirical Software Engineering, 2023, 28
  • [37] A Cross-project Defect Prediction Model Using Feature Transfer and Ensemble Learning
    Zeng, Fuping
    Lin, Wanting
    Xing, Ying
    Sun, Lu
    Yang, Bin
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2022, 29 (04): : 1089 - 1099
  • [38] Multi-Objective Cross-Project Defect Prediction
    Canfora, Gerardo
    De Lucia, Andrea
    Di Penta, Massimiliano
    Oliveto, Rocco
    Panichella, Annibale
    Panichella, Sebastiano
    2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2013), 2013, : 252 - 261
  • [39] Impact of hyper parameter optimization for cross-project software defect prediction
    Qu Y.
    Chen X.
    Zhao Y.
    Ju X.
    International Journal of Performability Engineering, 2018, 14 (06): : 1291 - 1299
  • [40] Which is More Important for Cross-Project Defect Prediction: Instance or Feature?
    Yu, Qiao
    Jiang, Shujuan
    Qian, Junyan
    2016 INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, TESTING AND EVOLUTION (SATE 2016), 2016, : 90 - 95