Simple or Complex? Together for a More Accurate Just-In-Time Defect Predictor

被引:8
作者
Zhou, Xin [1 ]
Han, DongGyun [1 ]
Lo, David [1 ]
机构
[1] Singapore Management Univ, Sch Comp & Informat Syst, Singapore, Singapore
来源
30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022) | 2022年
基金
新加坡国家研究基金会;
关键词
FUSION METHODS; BUGS;
D O I
10.1145/3524610.3527910
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Just-In-Time ( JIT) defect prediction aims to automatically predict whether a commit is defective or not, and has been widely studied in recent years. In general, most studies can be classified into two categories: 1) simple models using traditional machine learning classifiers with hand-crafted features, and 2) complex models using deep learning techniques to automatically extract features. Hand-crafted features used by simple models are based on expert knowledge but may not fully represent the semantic meaning of the commits. On the other hand, deep learning-based features used by complex models represent the semantic meaning of commits but may not reflect useful expert knowledge. Simple models and complex models seem complementary to each other to some extent. To utilize the advantages of both simple and complex models, we propose a combined model namely SimCom by fusing the prediction scores of one simple and one complex model. The experimental results show that our approach can significantly outperform the state-of-the-art by 6.0-18.1%. In addition, our experimental results confirm that the simple model and complex model are complementary to each other.
引用
收藏
页码:229 / 240
页数:12
相关论文
共 82 条
[1]   Graph-based Statistical Language Model for Code [J].
Anh Tuan Nguyen ;
Nguyen, Tien N. .
2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, :858-868
[2]  
[Anonymous], 2008, P 4 INT WORKSH PRED, DOI DOI 10.1145/1370788.1370801
[3]   Data mining techniques for building fault-proneness models in telecom Java']Java softwarea [J].
Arisholm, Erik ;
Biland, Lionel C. ;
Fuglerud, Magnus .
ISSRE 2007: 18TH IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, PROCEEDINGS, 2007, :215-+
[4]  
Bockhorst J, 2004, NIPS
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Buitinck L., 2013, ECML PKDD WORKSH LAN, P108
[7]   Comparative experiments on learning information extractors for proteins and their interactions [J].
Bunescu, R ;
Ge, RF ;
Kate, RJ ;
Marcotte, EM ;
Mooney, RJ ;
Ramani, AK ;
Wong, YW .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 33 (02) :139-155
[8]   Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction [J].
Cabral, George G. ;
Minku, Leandro L. ;
Shihab, Emad ;
Mujahid, Suhaib .
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, :666-676
[9]  
cakebuild, CONTR GUID CAK
[10]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794