Seml: A Semantic LSTM Model for Software Defect Prediction

被引:64
作者
Liang, Hongliang [1 ]
Yu, Yue [1 ]
Jiang, Lin [1 ]
Xie, Zhuosi [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China
基金
中国国家自然科学基金;
关键词
Defect prediction; Long Short Term Memory Network; word embedding;
D O I
10.1109/ACCESS.2019.2925313
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction can assist developers in finding potential bugs and reducing maintenance cost. Traditional approaches usually utilize software metrics (Lines of Code, Cyclomatic Complexity, etc.) as features to build classifiers and identify defective software modules. However, software metrics often fail to capture programs' syntax and semantic information. In this paper, we propose Seml, a novel framework that combines word embedding and deep learning methods for defect prediction. Specifically, for each program source file, we first extract a token sequence from its abstract syntax tree. Then, we map each token in the sequence to a real-valued vector using a mapping table, which is trained with an unsupervised word embedding model. Finally, we use the vector sequences and their labels (defective or non-defective) to build a Long Short Term Memory (LSTM) network. The LSTM model can automatically learn the semantic information of programs and perform defect prediction. The evaluation results on eight open source projects show that Seml outperforms three state-of-the-art defect prediction approaches on most of the datasets for both within-project defect prediction and cross-project defect prediction.
引用
收藏
页码:83812 / 83824
页数:13
相关论文
共 55 条
[31]   Software Defect Prediction via Convolutional Neural Network [J].
Li, Jian ;
He, Pinjia ;
Zhu, Jieming ;
Lyu, Michael R. .
2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS), 2017, :318-328
[32]  
Li Z., 2018, ARXIV180101681
[33]   On the Multiple Sources and Privacy Preservation Issues for Heterogeneous Defect Prediction [J].
Li, Zhiqiang ;
Jing, Xiao-Yuan ;
Zhu, Xiaoke ;
Zhang, Hongyu ;
Xu, Baowen ;
Ying, Shi .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2019, 45 (04) :391-411
[34]   Progress on approaches to software defect prediction [J].
Li, Zhiqiang ;
Jing, Xiao-Yuan ;
Zhu, Xiaoke .
IET SOFTWARE, 2018, 12 (03) :161-175
[35]   Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction [J].
Li, Zhiqiang ;
Jing, Xiao-Yuan ;
Wu, Fei ;
Zhu, Xiaoke ;
Xu, Baowen ;
Ying, Shi .
AUTOMATED SOFTWARE ENGINEERING, 2018, 25 (02) :201-245
[36]   Cross project project defect prediction using class distribution estimation and oversampling [J].
Limsettho, Nachai ;
Bennin, Kwabena Ebo ;
Keung, Jacky W. ;
Hata, Hideaki ;
Matsumoto, Kenichi .
INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 100 :87-102
[37]   Code churn: A neglected metric in effort-aware just-in-time defect prediction [J].
Liu, Jinping ;
Zhou, Yuming ;
Yang, Yibiao ;
Lu, Hongmin ;
Xu, Baowen .
11TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT (ESEM 2017), 2017, :11-19
[38]   Learning Feature Representations from Change Dependency Graphs for Defect Prediction [J].
Loyola, Pablo ;
Matsuo, Yutaka .
2017 IEEE 28TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2017, :361-372
[39]  
McCabe T. J., 1976, IEEE Transactions on Software Engineering, VSE-2, P308, DOI 10.1109/TSE.1976.233837
[40]   Are Fix-Inducing Changes a Moving Target? A Longitudinal Case Study of Just-In-Time Defect Prediction [J].
McIntosh, Shane ;
Kamei, Yasutaka .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (05) :412-428