Function Call Graph Context Encoding for Neural Source Code Summarization

被引:10
作者
Bansal, Aakash [1 ,2 ]
Eberhart, Zachary [1 ,2 ]
Karas, Zachary [1 ,2 ]
Huang, Yu [1 ,2 ]
Mcmillan, Collin [1 ,2 ]
机构
[1] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
[2] Univ Vanderbilt, Dept Comp Sci, Tennessee, IL USA
关键词
Codes; Source coding; Context modeling; Decoding; Algorithms; Software engineering; Machine translation; Automatic documentation generation; context-aware models; neural networks; source code summarization; PROGRAM COMPREHENSION; SOFTWARE MAINTENANCE; MENTAL MODELS; INFORMATION;
D O I
10.1109/TSE.2023.3279774
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Source code summarization is the task of writing natural language descriptions of source code. The primary use of these descriptions is in documentation for programmers. Automatic generation of these descriptions is a high value research target due to the time cost to programmers of writing these descriptions themselves. In recent years, a confluence of software engineering and artificial intelligence research has made inroads into automatic source code summarization through applications of neural models of that source code. However, an Achilles' heel to a vast majority of approaches is that they tend to rely solely on the context provided by the source code being summarized. But empirical studies in program comprehension are quite clear that the information needed to describe code much more often resides in the context in the form of Function Call Graph surrounding that code. In this paper, we present a technique for encoding this call graph context for neural models of code summarization. We implement our approach as a supplement to existing approaches, and show statistically significant improvement over existing approaches. In a human study with 20 programmers, we show that programmers perceive generated summaries to generally be as accurate, readable, and concise as human-written summaries.
引用
收藏
页码:4268 / 4281
页数:14
相关论文
共 78 条
[1]   Software Documentation: The Practitioners' Perspective [J].
Aghajani, Emad ;
Nagy, Csaba ;
Linares-Vasquez, Mario ;
Moreno, Laura ;
Bavota, Gabriele ;
Lanza, Michele ;
Shepherd, David C. .
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, :590-601
[2]   A Survey of Machine Learning for Big Code and Naturalness [J].
Allamanis, Miltiadis ;
Barr, Earl T. ;
Devanbu, Premkumar ;
Sutton, Charles .
ACM COMPUTING SURVEYS, 2018, 51 (04)
[3]  
Allamanis Miltiadis, 2018, INT C LEARN REPR
[4]   code2vec: Learning Distributed Representations of Code [J].
Alon, Uri ;
Zilberstein, Meital ;
Levy, Omer ;
Yahav, Eran .
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL)
[5]  
Alon Uri, 2019, P 7 INT C LEARNING R
[6]  
[Anonymous], 2002, ACM S DOCUMENT ENG, DOI 10.1145/585058.585065
[7]  
[Anonymous], 2007, Proceedings of the twenty-second ieee/acm international conference on automated software engineering p
[8]   A Comparison of Program Comprehension Strategies by Blind and Sighted Programmers [J].
Armaly, Ameer ;
Rodeghero, Paige ;
McMillan, Collin .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (08) :712-724
[9]   CrowdSummarizer Automated Generation of Code Summaries for Java']Java Programs through Crowdsourcing [J].
Badihi, Sahar ;
Heydarnoori, Abbas .
IEEE SOFTWARE, 2017, 34 (02) :71-80
[10]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473,1409.0473, DOI 10.48550/ARXIV.1409.0473,1409.0473]