Project-Level Encoding for Neural Source Code Summarization of Subroutines

被引：24

作者：

Bansal, Aakash ^{[1
]}

Haque, Sakib ^{[1
]}

McMillan, Collin ^{[1
]}

机构：

[1] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA

来源：

2021 IEEE/ACM 29TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2021) | 2021年

关键词：

source code summarization; automatic documentation generation; neural networks; PROGRAM COMPREHENSION;

D O I：

10.1109/ICPC52881.2021.00032

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Source code summarization of a subroutine is the task of writing a short, natural language description of that subroutine. The description usually serves in documentation aimed at programmers, where even brief phrase (e.g. "compresses data to a zip file") can help readers rapidly comprehend what a subroutine does without resorting to reading the code itself. Techniques based on neural networks (and encoder-decoder model designs in particular) have established themselves as the state-of-the-art. Yet a problem widely recognized with these models is that they assume the information needed to create a summary is present within the code being summarized itself - an assumption which is at odds with program comprehension literature. Thus a current research frontier lies in the question of encoding source code context into neural models of summarization. In this paper, we present a project-level encoder to improve models of code summarization. By project-level, we mean that we create a vectorized representation of selected code files in a software project, and use that representation to augment the encoder of state-of-the-art neural code summarization techniques. We demonstrate how our encoder improves several existing models, and provide guidelines for maximizing improvement while controlling time and resource costs in model size.

引用

页码：253 / 264

页数：12

共 50 条

[21] Competition and Innovation Revisited: A Project-Level View
Garfinkel, Jon A.
Hammoudeh, Mosab
REVIEW OF FINANCIAL STUDIES, 2024,
[22] Managing Open Innovation: A Project-Level Perspective
Bagherzadeh, Mehdi
Markovic, Stefan
Bogers, Marcel
IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 2021, 68 (01) : 301 - 316
[23] Survey on Neural Network-based Automatic Source Code Summarization Technologies
Song X.-T.
Sun H.-L.
Ruan Jian Xue Bao/Journal of Software, 2022, 33 (01): : 55 - 77
[24] Distilled GPT for source code summarization
Chia-Yi Su
Collin McMillan
Automated Software Engineering, 2024, 31
[25] A review of automatic source code summarization
Zhang, Xuejun
Hou, Xia
Qiao, Xiuming
Song, Wenfeng
EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (06)
[26] TASSAL: Autofolding for Source Code Summarization
Fowkes, Jaroslav
Chanthirasegaran, Pankajan
Ranca, Razvan
Allamanis, Miltiadis
Lapata, Mirella
Sutton, Charles
2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C), 2016, : 649 - 652
[27] Recommendations for Datasets for Source Code Summarization
LeClair, Alex
McMillan, Collin
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3931 - 3937
[28] Pyramid Attention For Source Code Summarization
Chai, Lei
Li, Ming
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[29] Distilled GPT for source code summarization
Su, Chia-Yi
McMillan, Collin
AUTOMATED SOFTWARE ENGINEERING, 2024, 31 (01)
[30] Recommendations for Datasets for Source Code Summarization
LeClair, Alex
McMillan, Collin
arXiv, 2019,

← 1 2 3 4 5 →