READSUM: Retrieval-Augmented Adaptive Transformer for Source Code Summarization

被引:3
作者
Choi, Yunseok [1 ]
Na, Cheolwon [2 ]
Kim, Hyojun [2 ]
Lee, Jee-Hyong [2 ]
机构
[1] Sungkyunkwan Univ, Dept Platform Software, Suwon 16419, South Korea
[2] Sungkyunkwan Univ, Dept Artificial Intelligence, Suwon 16419, South Korea
关键词
Codes; Transformers; Source coding; Syntactics; Data mining; Task analysis; Adaptive systems; Shortest path problem; Abstract syntax tree; adaptive transformer; source code summarization; fusion network; shortest path;
D O I
10.1109/ACCESS.2023.3271992
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code summarization is the process of automatically generating brief and informative summaries of source code to aid in software comprehension and maintenance. In this paper, we propose a novel model called READSUM, REtrieval-augmented ADaptive transformer for source code SUMmarization, that combines both abstractive and extractive approaches. Our proposed model generates code summaries in an abstractive manner, taking into account both the structural and sequential information of the input code, while also utilizing an extractive approach that leverages a retrieved summary of similar code to increase the frequency of important keywords. To effectively blend the original code and the retrieved similar code at the embedding layer stage, we obtain the augmented representation of the original code and the retrieved code through multi-head self-attention. In addition, we develop a self-attention network that adaptively learns the structural and sequential information for the representations in the encoder stage. Furthermore, we design a fusion network to capture the relation between the original code and the retrieved summary at the decoder stage. The fusion network effectively guides summary generation based on the retrieved summary. Finally, READSUM extracts important keywords using an extractive approach and generates high-quality summaries using an abstractive approach that considers both the structural and sequential information of the source code. We demonstrate the superiority of READSUM through various experiments and an ablation study. Additionally, we perform a human evaluation to assess the quality of the generated summary.
引用
收藏
页码:51155 / 51165
页数:11
相关论文
共 46 条
  • [1] Ahmad W. U., 2020, P 58 ANN M ASS COMPU, P4998, DOI [DOI 10.18653/V1/2020.ACL-MAIN.449, 10.18653/v1/2020.acl-main.449]
  • [2] Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655
  • [3] Allamanis M, 2016, PR MACH LEARN RES, V48
  • [4] Alon U, 2018, Arxiv, DOI arXiv:1803.09473
  • [5] Alon Uri, 2019, INT C LEARN REPR
  • [6] Banerjee S., 2005, P ACL WORKSH INTR EX, P65
  • [7] InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees
    Bui, Nghi D. Q.
    Yu, Yijun
    Jiang, Lingxiao
    [J]. 2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), 2021, : 1186 - 1197
  • [8] Chen FX, 2021, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, P2510
  • [9] Choi Y, 2021, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, P2842
  • [10] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171