Change distilling:: Tree differencing for fine-grained source code change extraction

被引:396
作者
Fluri, Beat [1 ]
Wuersch, Michael [1 ]
Pinzger, Martin [1 ]
Gall, Harald C. [1 ]
机构
[1] Univ Zurich, Dept Informat, CH-8050 Zurich, Switzerland
关键词
source code change extraction; tree-differencing algorithms; software repositories; software evolution analysis;
D O I
10.1109/TSE.2007.70731
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A key issue in software evolution analysis is the identification of particular changes that occur across several versions of a program. We present change distilling, a tree differencing algorithm for fine- grained source code change extraction. For that, we have improved the existing algorithm by Chawathe et al. for extracting changes in hierarchically structured data [ 8]. Our algorithm extracts changes by finding both a match between the nodes of the compared two abstract syntax trees and a minimum edit script that can transform one tree into the other given the computed matching. As a result, we can identify fine- grained change types between program versions according to our taxonomy of source code changes. We evaluated our change distilling algorithm with a benchmark that we developed, which consists of 1,064 manually classified changes in 219 revisions of eight methods from three different open source projects. We achieved significant improvements in extracting types of source code changes: Our algorithm approximates the minimum edit script 45 percent better than the original change extraction approach by Chawathe et al. We are able to find all occurring changes and almost reach the minimum conforming edit script, that is, we reach a mean absolute percentage error of 34 percent, compared to the 79 percent reached by the original algorithm. The paper describes both our change distilling algorithm and the results of our evaluation.
引用
收藏
页码:725 / 743
页数:19
相关论文
共 45 条
[1]   USE OF AN ASSOCIATION MEASURE BASED ON CHARACTER STRUCTURE TO IDENTIFY SEMANTICALLY RELATED PAIRS OF WORDS AND DOCUMENTS TITLES [J].
ADAMSON, GW ;
BOREHAM, J .
INFORMATION STORAGE AND RETRIEVAL, 1974, 10 (7-8) :253-260
[2]  
[Anonymous], 2002, ALGORITHMS TREES GRA
[3]  
[Anonymous], 2005, SOFTWARE ENG
[4]   JDiff: A differencing technique and tool for object-oriented programs [J].
Apiwattanapong, Taweesup ;
Orso, Alessandro ;
Harrold, Mary Jean .
AUTOMATED SOFTWARE ENGINEERING, 2007, 14 (01) :3-36
[5]  
ASKLUND U, 1994, P NORD WORKSH PROGR, P231
[6]   Clone detection using abstract syntax trees [J].
Baxter, ID ;
Yahin, A ;
Moura, L ;
Sant'Anna, M ;
Bier, L .
INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS, 1998, :368-377
[7]  
BERNSTEIN A, 2005, GENERIC JAVA LIB SIM
[8]   A survey on tree edit distance and related problems [J].
Bille, P .
THEORETICAL COMPUTER SCIENCE, 2005, 337 (1-3) :217-239
[9]  
CANFORA G, 2007, P 4 INT WORKSH MIN S, P14
[10]  
CHAWATHE S, 1996, P ACM SIGMOD INT C M, P493