Improving code readability classification using convolutional neural networks

被引:32
作者
Mi, Qing [1 ]
Keung, Jacky [1 ]
Xiao, Yan [1 ]
Mensah, Solomon [1 ]
Gao, Yujin [2 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[2] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
关键词
Code readability; Convolutional Neural Network; Deep learning; Program comprehension; Empirical software engineering; Open source software;
D O I
10.1016/j.infsof.2018.07.006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context Code readability classification (which refers to classification of a piece of source code as either readable or unreadable) has attracted increasing concern in academia and industry. To construct accurate classification models, previous studies depended mainly upon handcrafted features. However, the manual feature engineering process is usually labor-intensive and can capture only partial information about the source code, which is likely to limit the model performance. Objective: To improve code readability classification, we propose the use of Convolutional Neural Networks (ConvNets). Method: We first introduce a representation strategy (with different granularities) to transform source codes into integer matrices as the input to ConvNets. We then propose DeepCRM, a deep learning-based model for code readability classification. DeepCRM consists of three separate ConvNets with identical architectures that are trained on data preprocessed in different ways. We evaluate our approach against five state-of-the-art code readability models. Results: The experimental results show that DeepCRM can outperform previous approaches. The improvement in accuracy ranges from 2.4% to 17.2%. Conclusions: By eliminating the need for manual feature engineering, DeepCRM provides a relatively improved performance, confirming the efficacy of deep learning techniques in the task of code readability classification.
引用
收藏
页码:60 / 71
页数:12
相关论文
共 50 条
[1]   An integrated measure of software maintainability [J].
Aggarwal, KK ;
Singh, Y ;
Chhabra, JK .
ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2002 PROCEEDINGS, 2002, :235-241
[2]   Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports [J].
An Ngoc Lam ;
Anh Tuan Nguyen ;
Hoan Anh Nguyen ;
Nguyen, Tien N. .
2015 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2015, :476-481
[3]   Graph-based Statistical Language Model for Code [J].
Anh Tuan Nguyen ;
Nguyen, Tien N. .
2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 1, 2015, :858-868
[4]  
[Anonymous], IEEE T SOFTWARE ENG
[5]  
[Anonymous], 2013, EFFICIENT ESTIMATION
[6]  
[Anonymous], 2016, 12 INT C PRED MOD DA, DOI DOI 10.1145/2972958.2972963
[7]  
[Anonymous], CONVOLUTIONAL NEURAL
[8]  
[Anonymous], P 30 INT C MACH LEAR
[9]  
BASIT H A, 2007, P 6 JOINT M EUR SOFT, P513, DOI DOI 10.1145/1287624.1287698
[10]   To CamelCase or Under_score [J].
Binkley, Dave ;
Davis, Marcia ;
Lawrie, Dawn ;
Morrell, Christopher .
ICPC: 2009 IEEE 17TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, 2009, :158-+