A unified multi-task learning model for AST-level and token-level code completion

被引:19
作者
Liu, Fang [1 ,2 ]
Li, Ge [1 ,2 ]
Wei, Bolin [1 ,2 ]
Xia, Xin [3 ]
Fu, Zhiyi [1 ,2 ]
Jin, Zhi [1 ,2 ]
机构
[1] Peking Univ, Minist Educ, Key Lab High Confidence Software Technol, Beijing, Peoples R China
[2] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
[3] Huawei, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Code completion; Deep learning; Multi-task learning;
D O I
10.1007/s10664-022-10140-7
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code completion, one of the most useful features in the Integrated Development Environments (IDEs), can accelerate software development by suggesting the next probable tokens based on existing code in real-time. Recent studies have shown that recurrent neural networks based statistical language models can improve the performance of code completion tools through learning from large-scale software repositories. However, most of the existing approaches treat code completion as a single generation task in which the model predicts the value of the tokens or AST nodes based on the contextual source code without considering the syntactic constraints such as the static type information. Besides, the semantic relationships in programs can be very long. Existing recurrent neural networks based language models are not sufficient to model the long-term dependency. In this paper, we tackle the aforementioned limitations by building a unified multi-task learning based code completion model for both AST-level and token-level code completion. To model the relationship and constraints between the type and value of the code elements, we adopt a multi-task learning framework to predict the type and value of the tokens (AST nodes) simultaneously. To capture the long-term dependency in the input programs, we employ a self-attentional architecture based network as the base language model. We apply our approach to both AST-level and token-level code completion. Experimental results demonstrate the effectiveness of our model when compared with state-of-the-art methods.
引用
收藏
页数:38
相关论文
共 57 条
[51]   On the Localness of Software [J].
Tu, Zhaopeng ;
Su, Zhendong ;
Devanbu, Premkumar .
22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, :269-280
[52]  
Vaswani A, 2017, ADV NEUR IN, V30
[53]  
Wei B., 2019, ADV NEURAL INFORM PR, P6563
[54]   Toward Deep Learning Software Repositories [J].
White, Martin ;
Vendome, Christopher ;
Linares-Vasquez, Mario ;
Poshyvanyk, Denys .
12TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2015), 2015, :334-345
[55]  
WILCOXON F, 1946, J ECON ENTOMOL, V39, P269, DOI 10.1093/jee/39.2.269
[56]   CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning [J].
Yao, Ziyu ;
Peddamail, Jayavardhan Reddy ;
Sun, Huan .
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, :2203-2214
[57]  
Zaremoodi P, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P656