A Cross-Project Defect Prediction Model Based on Deep Learning With Self-Attention

被引：5

作者：

Wen, Wanzhi ^{[1
,2
]}

Zhang, Ruinian ^{[1
]}

Wang, Chuyue ^{[1
]}

Shen, Chenqiang ^{[1
]}

Yu, Meng ^{[1
]}

Zhang, Suchuan ^{[1
]}

Gao, Xinxin ^{[1
]}

机构：

[1] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China

[2] Nanjing Univ Aeronaut & Astronaut, Key Lab Safety Crit Software, Minist Ind & Informat Technol, Nanjing 211106, Peoples R China

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Semantics; Codes; Feature extraction; Software; Predictive models; Logic gates; Syntactics; Deep learning; Short long term memory; Defect prediction; deep learning; long and short-term memory; self-attention mechanism; SOFTWARE; ALGORITHM;

D O I：

10.1109/ACCESS.2022.3214536

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cross-project defect prediction technique is a hot topic in the field of software defect research because of the huge difference in data distribution between source project and target project. Software defect prediction technique usually first extracts software project features and then trains prediction models based on various classifiers. However, traditional features lack sufficient semantic information of source code resulting in poor performance of the prediction models. To construct more accurate prediction models based on the semantic information, we propose a cross-project defect prediction framework named BSLDP, which extracts semantic information of source code files through a bidirectional long and short-term memory network with self-attention mechanism. In particular, we provide semantic extractor named ALC to extract source code semantics based on source code files, and propose classification algorithm based on the semantic information of source project and target project, namely BSL, to build a prediction model. Furthermore, we propose an equal meshing mechanism that ALC generates semantic information on small fragments by dividing the numerical token vector to further improve the performance of the proposed model. We evaluated the performance of the proposed model on a publicly available PROMISE dataset. Compared with the four state-of-the-art methods, the experimental results indicate that on average BSLDP improves the performance of cross-project defect prediction in terms of F1 by 14.2%, 34.6%, 32.2% and 23.6%, respectively.

引用

页码：110385 / 110401

页数：17

共 55 条

[1]

Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864

[2] Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports [J].

An Ngoc Lam ;

Anh Tuan Nguyen ;

Hoan Anh Nguyen ;

Nguyen, Tien N. .

2015 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2015, :476-481

[3] Graph-based Statistical Language Model for Code [J].

Anh Tuan Nguyen ;

Nguyen, Tien N. .

2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, :858-868

[4]

Assim M., 2020, 2020 INT C INN INT I, P1, DOI [10.1109/3ICT51146.2020.9311966, DOI 10.1109/3ICT51146.2020.9311966]

[5] Assessing the applicability of fault-proneness models across object-oriented software projects [J].

Briand, LC ;

Melo, WL ;

Wüst, J .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) :706-720

[6] An Abstract Syntax Tree Encoding Method for Cross-Project Defect Prediction [J].

Cai, Ziyi ;

Lu, Lu ;

Qiu, Shaojian .

IEEE ACCESS, 2019, 7 :170844-170853

[7] DeepCPDP: Deep Learning Based Cross-Project Defect Prediction [J].

Chen, Deyu ;

Chen, Xiang ;

Li, Hao ;

Xie, Junfeng ;

Mu, Yanzhou .

IEEE ACCESS, 2019, 7 :184832-184848

[8] A METRICS SUITE FOR OBJECT-ORIENTED DESIGN [J].

CHIDAMBER, SR ;

KEMERER, CF .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1994, 20 (06) :476-493

[9]

Halstead M. H., 1977, ELEMENTS SOFTWARE SC

[10] An evaluation of the MOOD set of object-oriented software metrics [J].

Harrison, R ;

Counsell, SJ ;

Nithi, RV .

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1998, 24 (06) :491-496

← 1 2 3 4 5 6 →