Untangling Composite Commits by Attributed Graph Clustering

被引:3
作者
Chen, Siyu [1 ]
Xu, Shengbin [1 ]
Yao, Yuan [1 ]
Xu, Feng [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
来源
13TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE, INTERNETWARE 2022 | 2022年
基金
中国国家自然科学基金;
关键词
Commit untangling; code dependency graph; attributed graph clustering; DEFECT PREDICTION;
D O I
10.1145/3545258.3545267
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
During software development, it is considered to be a best practice if each commit represents one distinct concern, such as fixing a bug or adding a new feature. However, developers may not always follow this practice and sometimes tangle multiple concerns into a single composite commit. This makes automatic commit untangling a necessary task, and recent approaches mainly untangle commits via applying graph clustering on the code dependency graph. In this paper, we propose a new commit untangling approach, COMUNT, to decompose the composite commits into atomic ones. Different from existing approaches, COMUNT is built upon the observation that both the textual content of code statements and the dependencies between code statements contain useful semantic information so as to better comprehend the committed code changes. Based on this observation, COMUNT first constructs an attributed graph for each commit, where code statements and various code dependencies are modeled as nodes and edges, respectively, and the textual body of code statements are maintained as node attributes. It then conducts attributed graph clustering on the constructed graph. The used attributed graph clustering algorithm can simultaneously encode both graph structure and node attributes so as to better separate the code changes into clusters with distinct concerns. We evaluate our approach on nine C# projects, and the experimental result shows that COMUNT improves the state-of-the-art by 7.8% in terms of untangling accuracy, and meanwhile it is more than 6 times faster.
引用
收藏
页码:117 / 126
页数:10
相关论文
共 44 条
[1]  
Allamanis M., 2018, INT C LEARN REPR
[2]   Graph-based Statistical Language Model for Code [J].
Anh Tuan Nguyen ;
Nguyen, Tien N. .
2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, :858-868
[3]   Helping Developers Help Themselves: Automatic Decomposition of Code Review Changesets [J].
Barnett, Mike ;
Bird, Christian ;
Brunet, Joao ;
Lahiri, Shuvendu K. .
2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, :134-144
[4]   Structural Deep Clustering Network [J].
Bo, Deyu ;
Wang, Xiao ;
Shi, Chuan ;
Zhu, Meiqi ;
Lu, Emiao ;
Cui, Peng .
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, :1400-1410
[5]  
Brandes U, 2003, LECT NOTES COMPUT SC, V2832, P568
[6]  
CARRASCO J. J., 2003, Clustering of bipartite advertiser-keyword graph, P72
[7]   REFINYM: Using Names to Refine Types [J].
Dash, Santanu Kumar ;
Allamanis, Miltiadis ;
Barr, Earl T. .
ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, :107-117
[8]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9]  
Dias M, 2015, 2015 22ND INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), P341, DOI 10.1109/SANER.2015.7081844
[10]   THE PROGRAM DEPENDENCE GRAPH AND ITS USE IN OPTIMIZATION [J].
FERRANTE, J ;
OTTENSTEIN, KJ ;
WARREN, JD .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1987, 9 (03) :319-349