Using contextual information to predict co-changes

被引:14
作者
Wiese, Igor Scaliante [1 ]
Re, Reginaldo [1 ]
Steinmacher, Igor [1 ]
Kuroda, Rodrigo Takashi [2 ]
Oliva, Gustavo Ansaldi [3 ]
Treude, Christoph [3 ]
Gerosa, Marco Aurelio [3 ]
机构
[1] Fed Univ Technol Parana UTFPR, Comp Sci Dept, Campo Mourao, PR, Brazil
[2] Fed Univ Technol Parana UTFPR, PPGI PPGI UTFPR CP, Campo Mourao, PR, Brazil
[3] Univ Sao Paulo, Dept Comp Sci, Sao Paulo, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Contextual information; Co-change prediction; Software change context; Change coupling; Change propagation; Change impact analysis;
D O I
10.1016/j.jss.2016.07.016
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Background: Co-change prediction makes developers aware of which artifacts will change together with the artifact they are working on. In the past, researchers relied on structural analysis to build prediction models. More recently, hybrid approaches relying on historical information and textual analysis have been proposed. Despite the advances in the area, software developers still do not use these approaches widely, presumably because of the number of false recommendations. We conjecture that the contextual information of software changes collected from issues, developers' communication, and commit meta-data captures the change patterns of software artifacts and can improve the prediction models. Objective: Our goal is to develop more accurate co-change prediction models by using contextual information from software changes. Method: We selected pairs of files based on relevant association rules and built a prediction model for each pair relying on their associated contextual information. We evaluated our approach on two open source projects, namely Apache CXF and Derby. Besides calculating model accuracy metrics, we also performed a feature selection analysis to identify the best predictors when characterizing co-changes and to reduce overfitting. Results: Our models presented low rates of false negatives (similar to 8% average rate) and false positives (similar to 11% average rate). We obtained prediction models with AUC values ranging from 0.89 to 1.00 and our models outperformed association rules, our baseline model, when we compared their precision values. Commit-related metrics were the most frequently selected ones for both projects. On average, 6 out of 23 metrics were necessary to build the classifiers. Conclusions: Prediction models based on contextual information from software changes are accurate and, consequently, they can be used to support software maintenance and evolution, warning developers when they miss relevant artifacts while performing a software change. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:220 / 235
页数:16
相关论文
共 72 条
[1]  
[Anonymous], 2008, Journal of Statistical Software, Code Snippets, DOI [10.18637/jss.v028.c01, DOI 10.18637/JSS.V028.C01]
[2]  
[Anonymous], 2008, Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, DOI [10.1145/1453101.1453106, DOI 10.1145/1453101.1453106]
[3]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[4]  
Ball T., 1997, ICSE WORKSH PROC MOD
[5]  
Bavota G, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P692, DOI 10.1109/ICSE.2013.6606615
[6]  
Bettenburg N., 2012, EMPIRICAL SOFTWARE E
[7]   Clustering software artifacts based on frequent common changes [J].
Beyer, D ;
Noack, A .
13TH INTERNATIONAL WORKSHOP ON PROGRAM COMPREHENSION, PROCEEDINGS, 2005, :259-268
[8]  
Bicer S., 2011, Proceedings of International Conference on Software and Systems Process, P63
[9]  
Bird Christian, 2009, 2009 20th International Symposium on Software Reliability Engineering (ISSRE 2009), P109, DOI 10.1109/ISSRE.2009.17
[10]  
Bird C., 2011, 19 ACM SIGSOFT S 13, P4, DOI DOI 10.1145/2025113.2025119