Classifying Code Commits with Convolutional Neural Networks

被引:4
作者
Meng, Na [1 ]
Jiang, Zijian [1 ]
Zhong, Hao [2 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24061 USA
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年
基金
国家重点研发计划;
关键词
Program commit; classification; deep learning;
D O I
10.1109/IJCNN52387.2021.9533534
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Developers change software programs for various purposes (e.g., bug fixes, feature additions, and code refactorings), but the intents of code changes are often not recorded or are poorly documented. To automatically infer the change intent of each program commit (i.e., a set of code changes), existing work classifies commits based on commit messages and/or the sheer counts of edited files, lines, or abstract syntax tree (AST) nodes. However, none of these tools reason about the syntactic or semantic dependencies between co-applied changes, neither do they adopt any deep learning method. To better characterize program commits, in this paper, we present CClassifier-a new approach that classifies commits by (1) using advanced static program analysis to comprehend relationship between co-applied edits, (2) representing edits and their relationship via graphs, and (3) applying convolutional neural networks (CNN) to classify those graphs. Compared with prior work, CClassifier extracts a richer set of features from program changes; it is the first to classify program commits using CNN. For evaluation, we prepared a benchmark that contains 7,414 code changes from 5 open-source Java projects. On this benchmark, we empirically compared CClassifier and the state-of-the-art approach with five-fold cross validation. On average, when predicting bug-fixing commits within the same projects, CClassifier improved the prediction accuracy from 70% to 72%. More importantly, prior work seldom identifies feature-addition commits; CClassifier can successfully identify such commits in a lot more scenarios. Our evaluation shows that CClassifier outperforms prior work due to its usage of advanced program analysis and CNN.
引用
收藏
页数:8
相关论文
共 44 条
[21]  
Hindle A., 2009, 2009 IEEE 17 INT C P
[22]  
I. T. W. R. Center, 2006, WAL
[23]  
Jackson, 1994, P 1994 INT C SOFTW M
[24]  
Jiang S., 2017, ASE 17
[25]   Classifying software changes: Clean or buggy? [J].
Kim, Sunghun ;
Whitehead, E. James, Jr. ;
Zhang, Yi .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2008, 34 (02) :181-196
[26]  
Kohavi R., 1995, Proceedings of the International Joint Conference on Artificial Intelligence, P1137, DOI DOI 10.5555/1643031.1643047
[27]   ImageNet Classification with Deep Convolutional Neural Networks [J].
Krizhevsky, Alex ;
Sutskever, Ilya ;
Hinton, Geoffrey E. .
COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90
[28]  
Levin S., 2017, INT C PRED MOD DAT A, P97, DOI [10.1145/3127005.3127016, DOI 10.1145/3127005.3127016]
[29]   VulDeePecker: A Deep Learning-Based System for Vulnerability Detection [J].
Li, Zhen ;
Zou, Deqing ;
Xu, Shouhuai ;
Ou, Xinyu ;
Jin, Hai ;
Wang, Sujuan ;
Deng, Zhijun ;
Zhong, Yuyi .
25TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2018), 2018,
[30]   A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes [J].
Loyola, Pablo ;
Marrese-Taylor, Edison ;
Matsuo, Yutaka .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, :287-292