DeepCPDP: Deep Learning Based Cross-Project Defect Prediction

被引:27
|
作者
Chen, Deyu [1 ]
Chen, Xiang [1 ]
Li, Hao [2 ]
Xie, Junfeng [3 ]
Mu, Yanzhou [2 ]
机构
[1] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
[2] Tianjin Univ, Coll Intelligence & Comp, Tianjin 300072, Peoples R China
[3] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
来源
IEEE ACCESS | 2019年 / 7卷
基金
中国国家自然科学基金;
关键词
Software defect prediction; cross-project defect prediction; bi-directional long short-term memory; embedding method; attention mechanism; FEATURE-SELECTION; MODEL; FRAMEWORK;
D O I
10.1109/ACCESS.2019.2961129
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-project defect prediction (CPDP) is an active research topic in the domain of software defect prediction, since CPDP can be applied to the following scenarios: the target project for software defect prediction is a new project or the target project does not have enough labeled modules. Most of the previous work tried to utilize the labeled dataset gathered from other projects (i.e., the source projects) and then proposed transfer learning based methods to reduce the data distribution difference between different projects. In this article, we propose a deep learning based CPDP method DeepCPDP. For this method, we represent source code of each extracted program module by using simplified abstract syntax tree (SimAST). For a node of SimAST, we only keep its node type, since this is project-independent, while we ignore the name of method and variable, since these information are project-specific. Therefore, SimAST is project-independent and especially suitable for the task of CPDP. Then, we extract the token vector from each module after it is modeled via SimAST. Moreover, we design a new unsupervised based embedding method SimASTToken2Vec to learn meaningful representation for these extracted token vectors. Later, we employ Bi-directional Long Short-Term Memory (BiLSTM) neural network to automatically learn semantic features from embedded token vectors. In addition, we use attention mechanism over the BiLSTM layer to learn the weight of the vectors from the learned semantic features. Finally, we construct CPDP models via Logistic regression classifier. To show the effectiveness of DeepCPDP, ten large-scale projects from different application domains are used and AUC measure is used to measure the prediction performance of trained models. By using Scott-Knott test, we can find DeepCPDP can significantly outperform eight state-of-the-art baselines. Moreover, we also verify that the usage of SimASTToken2Vec, BiLSTM and attention mechanism is competitive in our proposed method.
引用
收藏
页码:184832 / 184848
页数:17
相关论文
共 50 条
  • [21] Selective Pseudo-Labeling Based Subspace Learning for Cross-Project Defect Prediction
    Sun, Ying
    Jing, Xiao-Yuan
    Wu, Fei
    Sun, Yanfei
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (09) : 2003 - 2006
  • [22] Efficient Cross-Project Software Defect Prediction Based on Federated Meta-Learning
    Chen, Haisong
    Yang, Linlin
    Wang, Aili
    ELECTRONICS, 2024, 13 (06)
  • [23] An Empirical Study on the Effectiveness of Feature Selection for Cross-Project Defect Prediction
    Yu, Qiao
    Qian, Junyan
    Jiang, Shujuan
    Wu, Zhenhua
    Zhang, Gongjie
    IEEE ACCESS, 2019, 7 : 35710 - 35718
  • [24] Unsupervised Domain Adaptation Based on Discriminative Subspace Learning for Cross-Project Defect Prediction
    Sun, Ying
    Sun, Yanfei
    Qi, Jin
    Wu, Fei
    Jing, Xiao-Yuan
    Xue, Yu
    Shen, Zixin
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (03): : 3373 - 3389
  • [25] Cross-project software defect prediction based on the reduction and hybridization of software metrics
    Abdu, Ahmed
    Zhai, Zhengjun
    Abdo, Hakim A.
    Lee, Sungon
    Al-masni, Mohammed A.
    Gu, Yeong Hyeon
    Algabri, Redhwan
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 112 : 161 - 176
  • [26] An Empirical Study on Combining Source Selection and Transfer Learning for Cross-Project Defect Prediction
    Wen, Wanzhi
    Zhang, Bin
    Gu, Xiang
    Ju, Xiaolin
    2019 IEEE 1ST INTERNATIONAL WORKSHOP ON INTELLIGENT BUG FIXING (IBF '19), 2019, : 29 - 38
  • [27] An Investigation of Imbalanced Ensemble Learning Methods for Cross-Project Defect Prediction
    Qiu, Shaojian
    Lu, Lu
    Jiang, Siyu
    Guo, Yang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (12)
  • [28] Joint feature representation learning and progressive distribution matching for cross-project defect prediction
    Zou, Quanyi
    Lu, Lu
    Yang, Zhanyu
    Gu, Xiaowei
    Qiu, Shaojian
    INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 137 (137)
  • [29] Cross-project defect prediction based on G-LSTM model
    Xing, Ying
    Qian, Xiaomeng
    Guan, Yu
    Yang, Bin
    Zhang, Yuwei
    PATTERN RECOGNITION LETTERS, 2022, 160 : 50 - 57
  • [30] Cross-Project Defect Prediction Method Based on Manifold Feature Transformation
    Zhao, Yu
    Zhu, Yi
    Yu, Qiao
    Chen, Xiaoying
    FUTURE INTERNET, 2021, 13 (08)