A Neural Model for Method Name Generation from Functional Description

被引:29
作者
Gao, Sa [1 ]
Chen, Chunyang [2 ]
Xing, Zhenchang [3 ]
Ma, Yukun [1 ]
Song, Wen [1 ]
Lin, Shang-Wei [1 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[2] Monash Univ, Fac Informat Technol, Clayton, Vic, Australia
[3] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT, Australia
来源
2019 IEEE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER) | 2019年
关键词
Naming Convention; Encoder-Decoder Model; Transfer Learning;
D O I
10.1109/saner.2019.8667994
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The names of software artifacts, e.g., method names, are important for software understanding and maintenance, as good names can help developers easily understand others' code. However, the existing naming guidelines are difficult for developers, especially novices, to come up with meaningful, concise and compact names for the variables, methods, classes and files. With the popularity of open source, an enormous amount of project source code can be accessed, and the exhaustiveness and instability of manually naming methods could now be relieved by automatically learning a naming model from a large code repository. Nevertheless, building a comprehensive naming system is still challenging, due to the gap between natural language functional descriptions and method names. Specifically, there are three challenges: how to model the relationship between the functional descriptions and formal method names, how to handle the explosion of vocabulary when dealing with large repositories, and how to leverage the knowledge learned from large repositories to a specific project. To answer these questions, we propose a neural network to directly generate readable method names from natural language description. The proposed method is built upon the encoder-decoder framework with the attention and copying mechanisms. Our experiments show that our method can generate meaningful and accurate method names and achieve significant improvement over the state-of-the-art baseline models. We also address the cold-start problem using a training trick to utilize big data in Github for specific projects.
引用
收藏
页码:411 / 421
页数:11
相关论文
共 28 条
[1]  
Allamanis M, 2016, PR MACH LEARN RES, V48
[2]   Suggesting Accurate Method and Class Names [J].
Allamanis, Miltiadis ;
Barr, Earl T. ;
Bird, Christian ;
Sutton, Charles .
2015 10TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE 2015) PROCEEDINGS, 2015, :38-49
[3]  
[Anonymous], WRIT DOC COMM JAV TO
[4]  
[Anonymous], ABS160306393 CORR
[5]  
[Anonymous], 2014, 3 INT C LEARN REPR
[6]  
[Anonymous], ARXIV170809492
[7]  
[Anonymous], 2015, COMPUTER SCI
[8]  
[Anonymous], 2010, INTERSPEECH, DOI DOI 10.1016/J.CSL.2010.08.008
[9]   The impact of identifier style on effort and comprehension [J].
Binkley, Dave ;
Davis, Marcia ;
Lawrie, Dawn ;
Maletic, Jonathan I. ;
Morrell, Christopher ;
Sharif, Bonita .
EMPIRICAL SOFTWARE ENGINEERING, 2013, 18 (02) :219-276
[10]  
Boswell D., 2011, ART READABLE CODE