Learning Feature Representations from Change Dependency Graphs for Defect Prediction

被引:5
作者
Loyola, Pablo [1 ]
Matsuo, Yutaka [1 ]
机构
[1] Univ Tokyo, Grad Sch Engn, Tokyo, Japan
来源
2017 IEEE 28TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE) | 2017年
关键词
D O I
10.1109/ISSRE.2017.30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given the heterogeneity of the data that can be extracted from the software development process, defect prediction techniques have focused on associating different sources of data with the introduction of faulty code, usually relying on handcrafted features. While these efforts have generated considerable progress over the years, little attention has been given to the fact that the performance of any predictive model depends heavily on the representation of the data used, and that different representations can lead to different results. We consider this a relevant problem, as it could be affecting directly the efforts towards generating safer software systems. Therefore, we propose to study the impact of the representation of the data in defect prediction models. To this end, we focus on the use of developer activity data, from which we structure dependency graphs. Then, instead of manually generating features, such as network metrics, we propose two models inspired by recent advances in representation learning which are able to automatically generate feature representations from graph data. These new representations are compared against manually crafted features for defect prediction in real world software projects. Our results show that automatically learned features are competitive, reaching increments in prediction performance up to 13%.
引用
收藏
页码:361 / 372
页数:12
相关论文
共 63 条
[1]   A Study of the Time Dependence of Code Changes [J].
Alam, Omar ;
Adams, Bram ;
Hassan, Ahmed E. .
16TH WORKING CONFERENCE ON REVERSE ENGINEERING (WCRE 2009), 2009, :21-30
[2]  
Allamanis M, 2016, PR MACH LEARN RES, V48
[3]   Why do commercial companies contribute to open source software? [J].
Andersen-Gott, Morten ;
Ghinea, Gheorghita ;
Bygstad, Bendik .
INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2012, 32 (02) :106-117
[4]  
[Anonymous], ARXIV160502115
[5]  
[Anonymous], 1957, Studies in Linguistic Analysis
[6]  
[Anonymous], 2008, Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, DOI [10.1145/1453101.1453106, DOI 10.1145/1453101.1453106]
[7]  
[Anonymous], 2005, INT WORKSHOP ARTIFIC
[8]  
[Anonymous], 2008, 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE)
[9]  
[Anonymous], 2014, ARXIV14093358
[10]  
Bengio Y, 2001, ADV NEUR IN, V13, P932