Learning Feature Representations from Change Dependency Graphs for Defect Prediction

被引:5
作者
Loyola, Pablo [1 ]
Matsuo, Yutaka [1 ]
机构
[1] Univ Tokyo, Grad Sch Engn, Tokyo, Japan
来源
2017 IEEE 28TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE) | 2017年
关键词
D O I
10.1109/ISSRE.2017.30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given the heterogeneity of the data that can be extracted from the software development process, defect prediction techniques have focused on associating different sources of data with the introduction of faulty code, usually relying on handcrafted features. While these efforts have generated considerable progress over the years, little attention has been given to the fact that the performance of any predictive model depends heavily on the representation of the data used, and that different representations can lead to different results. We consider this a relevant problem, as it could be affecting directly the efforts towards generating safer software systems. Therefore, we propose to study the impact of the representation of the data in defect prediction models. To this end, we focus on the use of developer activity data, from which we structure dependency graphs. Then, instead of manually generating features, such as network metrics, we propose two models inspired by recent advances in representation learning which are able to automatically generate feature representations from graph data. These new representations are compared against manually crafted features for defect prediction in real world software projects. Our results show that automatically learned features are competitive, reaching increments in prediction performance up to 13%.
引用
收藏
页码:361 / 372
页数:12
相关论文
共 63 条
[11]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[12]  
Bettenburg N., 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR 2012), P60, DOI 10.1109/MSR.2012.6224300
[13]  
Bird Christian, 2009, 2009 20th International Symposium on Software Reliability Engineering (ISSRE 2009), P109, DOI 10.1109/ISSRE.2009.17
[14]  
Bird C., 2011, 19 ACM SIGSOFT S 13, P4, DOI DOI 10.1145/2025113.2025119
[15]  
Bishop C.M., 2006, PATTERN RECOGN, V4, P738, DOI DOI 10.1117/1.2819119
[16]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[17]   Representation Learning for Information Diffusion through Social Networks: an Embedded Cascade Model [J].
Bourigault, Simon ;
Lamprier, Sylvain ;
Gallinari, Patrick .
PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16), 2016, :573-582
[18]  
Brudaru I. I., 2008, P 2008 INT WORKSHOP, P30
[19]   The small world of human language [J].
Cancho, RFI ;
Solé, RV .
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2001, 268 (1482) :2261-2265
[20]  
Cao S., 2015, P 24 ACM INT C INF K, P891