Automatic Code Documentation Generation Using GPT-3

被引:34
作者
Khan, Junaed Younus [1 ]
Uddin, Gias [1 ]
机构
[1] Univ Calgary, DISA Lab, Calgary, AB, Canada
来源
PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022 | 2022年
基金
加拿大自然科学与工程研究理事会;
关键词
code documentation; GPT-3; Machine Learning;
D O I
10.1145/3551349.3559548
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Source code documentation is an important artifact for efficient software development. Code documentation could greatly benefit from automation since manual documentation is often labouring, resource and time-intensive. In this paper, we employed Codex for automatic code documentation creation. Codex is a GPT-3 based model pre-trained on both natural and programming languages. We find that Codex outperforms existing techniques even with basic settings like one-shot learning (i.e., providing only one example for training). Codex achieves an overall BLEU score of 20.6 for six different programming languages (11.2% improvement over earlier state-of-the-art techniques). Thus, Codex shows promise and warrants in-depth future studies for automatic code documentation generation to support diverse development tasks.
引用
收藏
页数:6
相关论文
共 59 条
[1]  
Abid NJ, 2015, PROC IEEE INT CONF S, P561, DOI 10.1109/ICSM.2015.7332514
[2]   Software Documentation: The Practitioners' Perspective [J].
Aghajani, Emad ;
Nagy, Csaba ;
Linares-Vasquez, Mario ;
Moreno, Laura ;
Bavota, Gabriele ;
Lanza, Michele ;
Shepherd, David C. .
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, :590-601
[3]   Software Documentation Issues Unveiled [J].
Aghajani, Emad ;
Nagy, Csaba ;
Lucero Vega-Marquez, Olga ;
Linares-Vasquez, Mario ;
Moreno, Laura ;
Bavota, Gabriele ;
Lanza, Michele .
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, :1199-1210
[4]  
Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655
[5]  
Allamanis M, 2016, PR MACH LEARN RES, V48
[6]  
[Anonymous], 2014, P 22 INT C PROGR COM
[7]  
Brown TB, 2020, ADV NEUR IN, V33
[8]  
Chen M., 2021, arXiv
[9]   A Neural Framework for Retrieval and Summarization of Source Code [J].
Chen, Qingying ;
Zhou, Minghui .
PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, :826-831
[10]  
Chin L., 2021, arXiv