Untangling Composite Commits by Attributed Graph Clustering

被引:3
作者
Chen, Siyu [1 ]
Xu, Shengbin [1 ]
Yao, Yuan [1 ]
Xu, Feng [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
来源
13TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2022 | 2022年
基金
中国国家自然科学基金;
关键词
Commit untangling; code dependency graph; attributed graph clustering; DEFECT PREDICTION;
D O I
10.1145/3545258.3545267
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
During software development, it is considered to be a best practice if each commit represents one distinct concern, such as fixing a bug or adding a new feature. However, developers may not always follow this practice and sometimes tangle multiple concerns into a single composite commit. This makes automatic commit untangling a necessary task, and recent approaches mainly untangle commits via applying graph clustering on the code dependency graph. In this paper, we propose a new commit untangling approach, COMUNT, to decompose the composite commits into atomic ones. Different from existing approaches, COMUNT is built upon the observation that both the textual content of code statements and the dependencies between code statements contain useful semantic information so as to better comprehend the committed code changes. Based on this observation, COMUNT first constructs an attributed graph for each commit, where code statements and various code dependencies are modeled as nodes and edges, respectively, and the textual body of code statements are maintained as node attributes. It then conducts attributed graph clustering on the constructed graph. The used attributed graph clustering algorithm can simultaneously encode both graph structure and node attributes so as to better separate the code changes into clusters with distinct concerns. We evaluate our approach on nine C# projects, and the experimental result shows that COMUNT improves the state-of-the-art by 7.8% in terms of untangling accuracy, and meanwhile it is more than 6 times faster.
引用
收藏
页码:117 / 126
页数:10
相关论文
共 44 条
[11]   Self-organization and identification of web communities [J].
Flake, GW ;
Lawrence, S ;
Giles, CL ;
Coetzee, FM .
COMPUTER, 2002, 35 (03) :66-+
[12]   Clustering by passing messages between data points [J].
Frey, Brendan J. ;
Dueck, Delbert .
SCIENCE, 2007, 315 (5814) :972-976
[13]  
Gkantsidis C, 2003, IEEE INFOCOM SER, P364
[14]   Interactively Decomposing Composite Changes to Support Code Review and Regression Testing [J].
Guo, Bo ;
Song, Myoungkyu .
2017 IEEE 41ST ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2017, :118-127
[15]  
Hartigan J. A., 1979, Applied Statistics, V28, P100, DOI 10.2307/2346830
[16]  
Herbold S., 2021, A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits
[17]   The impact of tangled code changes on defect prediction models [J].
Herzig, Kim ;
Just, Sascha ;
Zeller, Andreas .
EMPIRICAL SOFTWARE ENGINEERING, 2016, 21 (02) :303-336
[18]  
Herzig K, 2013, IEEE WORK CONF MIN S, P121, DOI 10.1109/MSR.2013.6624018
[19]   On the Naturalness of Software [J].
Hindle, Abram ;
Barr, Earl T. ;
Gabel, Mark ;
Su, Zhendong ;
Devanbu, Premkumar .
COMMUNICATIONS OF THE ACM, 2016, 59 (05) :122-131
[20]  
Hoan Anh Nguyen, 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). Proceedings, P819, DOI 10.1109/ICSE.2019.00089