Classifying Code Commits with Convolutional Neural Networks

被引:4
作者
Meng, Na [1 ]
Jiang, Zijian [1 ]
Zhong, Hao [2 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24061 USA
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
来源
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年
基金
国家重点研发计划;
关键词
Program commit; classification; deep learning;
D O I
10.1109/IJCNN52387.2021.9533534
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Developers change software programs for various purposes (e.g., bug fixes, feature additions, and code refactorings), but the intents of code changes are often not recorded or are poorly documented. To automatically infer the change intent of each program commit (i.e., a set of code changes), existing work classifies commits based on commit messages and/or the sheer counts of edited files, lines, or abstract syntax tree (AST) nodes. However, none of these tools reason about the syntactic or semantic dependencies between co-applied changes, neither do they adopt any deep learning method. To better characterize program commits, in this paper, we present CClassifier-a new approach that classifies commits by (1) using advanced static program analysis to comprehend relationship between co-applied edits, (2) representing edits and their relationship via graphs, and (3) applying convolutional neural networks (CNN) to classify those graphs. Compared with prior work, CClassifier extracts a richer set of features from program changes; it is the first to classify program commits using CNN. For evaluation, we prepared a benchmark that contains 7,414 code changes from 5 open-source Java projects. On this benchmark, we empirically compared CClassifier and the state-of-the-art approach with five-fold cross validation. On average, when predicting bug-fixing commits within the same projects, CClassifier improved the prediction accuracy from 70% to 72%. More importantly, prior work seldom identifies feature-addition commits; CClassifier can successfully identify such commits in a lot more scenarios. Our evaluation shows that CClassifier outperforms prior work due to its usage of advanced program analysis and CNN.
引用
收藏
页数:8
相关论文
共 44 条
  • [1] Allamanis Miltiadis, 2017, Learning to represent programs with graphs
  • [2] Allen J.R., 1983, Proc. of the Symposium on Principles of Programming Languages. POPL, P177
  • [3] code2vec: Learning Distributed Representations of Code
    Alon, Uri
    Zilberstein, Meital
    Levy, Omer
    Yahav, Eran
    [J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL):
  • [4] [Anonymous], 2019, ACTIVEMQ
  • [5] [Anonymous], 2019, MAHOUT
  • [6] [Anonymous], 2016, CASSANDRA 11834 DONT
  • [7] [Anonymous], 2011, AMQ 3379 IMPLEMENT E
  • [8] [Anonymous], 2020, NetworkX
  • [9] [Anonymous], 2019, ARIES
  • [10] [Anonymous], 2014, CASSANDRA 7038 NODET