READSUM: Retrieval-Augmented Adaptive Transformer for Source Code Summarization

被引:3
作者
Choi, Yunseok [1 ]
Na, Cheolwon [2 ]
Kim, Hyojun [2 ]
Lee, Jee-Hyong [2 ]
机构
[1] Sungkyunkwan Univ, Dept Platform Software, Suwon 16419, South Korea
[2] Sungkyunkwan Univ, Dept Artificial Intelligence, Suwon 16419, South Korea
关键词
Codes; Transformers; Source coding; Syntactics; Data mining; Task analysis; Adaptive systems; Shortest path problem; Abstract syntax tree; adaptive transformer; source code summarization; fusion network; shortest path;
D O I
10.1109/ACCESS.2023.3271992
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code summarization is the process of automatically generating brief and informative summaries of source code to aid in software comprehension and maintenance. In this paper, we propose a novel model called READSUM, REtrieval-augmented ADaptive transformer for source code SUMmarization, that combines both abstractive and extractive approaches. Our proposed model generates code summaries in an abstractive manner, taking into account both the structural and sequential information of the input code, while also utilizing an extractive approach that leverages a retrieved summary of similar code to increase the frequency of important keywords. To effectively blend the original code and the retrieved similar code at the embedding layer stage, we obtain the augmented representation of the original code and the retrieved code through multi-head self-attention. In addition, we develop a self-attention network that adaptively learns the structural and sequential information for the representations in the encoder stage. Furthermore, we design a fusion network to capture the relation between the original code and the retrieved summary at the decoder stage. The fusion network effectively guides summary generation based on the retrieved summary. Finally, READSUM extracts important keywords using an extractive approach and generates high-quality summaries using an abstractive approach that considers both the structural and sequential information of the source code. We demonstrate the superiority of READSUM through various experiments and an ablation study. Additionally, we perform a human evaluation to assess the quality of the generated summary.
引用
收藏
页码:51155 / 51165
页数:11
相关论文
共 46 条
  • [11] Diederik P, 2015, Adam: a method for stochastic optimization
  • [12] Feng ZY, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P1536
  • [13] Fernandes Patrick, 2019, INT C LEARNING REPRE
  • [14] Guo DY, 2021, Arxiv, DOI [arXiv:2009.08366, DOI 10.48550/ARXIV.2009.08366]
  • [15] Guo JC, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), P486
  • [16] Harer J, 2019, Arxiv, DOI arXiv:1908.00449
  • [17] Hu X, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2269
  • [18] Deep Code Comment Generation
    Hu, Xing
    Li, Ge
    Xia, Xin
    Lo, David
    Jin, Zhi
    [J]. 2018 IEEE/ACM 26TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2018), 2018, : 200 - 210
  • [19] Iyer S, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P2073
  • [20] LeClair A, 2020, Arxiv, DOI arXiv:2004.02843