Ensemble Models for Neural Source Code Summarization of Subroutines

被引:21
作者
LeClair, Alexander [1 ]
Bansal, Aakash [1 ]
McMillan, Collin [1 ]
机构
[1] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2021) | 2021年
关键词
source code summarization; automatic documentation generation; neural networks;
D O I
10.1109/ICSME52107.2021.00032
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A source code summary of a subroutine is a brief description of that subroutine. Summaries underpin a majority of documentation consumed by programmers, such as the method summaries in JavaDocs. Source code summarization is the task of writing these summaries. At present, most state-of-the-art approaches for code summarization are neural network-based solutions akin to seq2seq, graph2seq, and other encoder-decoder architectures. The input to the encoder is source code, while the decoder helps predict the natural language summary. While these models tend to be similar in structure, evidence is emerging that different models make different contributions to prediction quality - differences in model performance are orthogonal and complementary rather than uniform over the entire dataset. In this paper, we explore the orthogonal nature of different neural code summarization approaches and propose ensemble models to exploit this orthogonality for better overall performance. We demonstrate that a simple ensemble strategy boosts performance by up to 14.8%, and provide an explanation for this boost. The takeaway from this work is that a relatively small change to the inference procedure in most neural code summarization techniques leads to outsized improvements in prediction quality.
引用
收藏
页码:286 / 297
页数:12
相关论文
共 38 条
[1]  
Ahmad Wasi, 2020, P 58 ANN M ASS COMP, P4998, DOI [10.18653/v1/2020.acl-main.449, DOI 10.18653/V1/2020.ACL-MAIN.449]
[2]   Ensemble Approach of Optimized Artificial Neural Networks for Solar Photovoltaic Power Prediction [J].
Al-Dahidi, Sameer ;
Ayadi, Osama ;
Alrbai, Mohammed ;
Adeeb, Jihad .
IEEE ACCESS, 2019, 7 :81741-81758
[3]   The Adverse Effects of Code Duplication in Machine Learning Models of Code [J].
Allamams, Miltiadis .
PROCEEDINGS OF THE 2019 ACM SIGPLAN INTERNATIONAL SYMPOSIUM ON NEW IDEAS, NEW PARADIGMS, AND REFLECTIONS ON PROGRAMMING AND SOFTWARE (ONWARD!' 19), 2019, :143-153
[4]   A Survey of Machine Learning for Big Code and Naturalness [J].
Allamanis, Miltiadis ;
Barr, Earl T. ;
Devanbu, Premkumar ;
Sutton, Charles .
ACM COMPUTING SURVEYS, 2018, 51 (04)
[5]  
Alon U., 2019, INT C LEARN REPR
[6]  
[Anonymous], 2002, P 2002 ACM S DOCUMEN, DOI 10.1145/585058.585065
[7]  
Asad M, 2015, I C APPL INF COMM TE, P263, DOI 10.1109/ICAICT.2015.7338559
[8]  
Bansal A., 2021, 29 IEEE ACM INT C PR
[9]   An ensemble of LSTM neural networks for high-frequency stock market classification [J].
Borovkova, Svetlana ;
Tsiamas, Ioannis .
JOURNAL OF FORECASTING, 2019, 38 (06) :600-619
[10]   A Survey of Multilingual Neural Machine Translation [J].
Dabre, Raj ;
Chu, Chenhui ;
Kunchukuttan, Anoop .
ACM COMPUTING SURVEYS, 2020, 53 (05)