A unified multi-task learning model for AST-level and token-level code completion

被引:19
作者
Liu, Fang [1 ,2 ]
Li, Ge [1 ,2 ]
Wei, Bolin [1 ,2 ]
Xia, Xin [3 ]
Fu, Zhiyi [1 ,2 ]
Jin, Zhi [1 ,2 ]
机构
[1] Peking Univ, Minist Educ, Key Lab High Confidence Software Technol, Beijing, Peoples R China
[2] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
[3] Huawei, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Code completion; Deep learning; Multi-task learning;
D O I
10.1007/s10664-022-10140-7
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code completion, one of the most useful features in the Integrated Development Environments (IDEs), can accelerate software development by suggesting the next probable tokens based on existing code in real-time. Recent studies have shown that recurrent neural networks based statistical language models can improve the performance of code completion tools through learning from large-scale software repositories. However, most of the existing approaches treat code completion as a single generation task in which the model predicts the value of the tokens or AST nodes based on the contextual source code without considering the syntactic constraints such as the static type information. Besides, the semantic relationships in programs can be very long. Existing recurrent neural networks based language models are not sufficient to model the long-term dependency. In this paper, we tackle the aforementioned limitations by building a unified multi-task learning based code completion model for both AST-level and token-level code completion. To model the relationship and constraints between the type and value of the code elements, we adopt a multi-task learning framework to predict the type and value of the tokens (AST nodes) simultaneously. To capture the long-term dependency in the input programs, we employ a self-attentional architecture based network as the base language model. We apply our approach to both AST-level and token-level code completion. Experimental results demonstrate the effectiveness of our model when compared with state-of-the-art methods.
引用
收藏
页数:38
相关论文
共 57 条
[11]  
Cho K., 2014, PROC 8 WORKSHOP SYNT, P103, DOI DOI 10.3115/V1/W14-4012
[12]   TIPMerge: Recommending Experts for Integrating Changes across Branches [J].
Costa, Catarina ;
Figueiredo, Jair ;
Murta, Leonardo ;
Sarma, Anita .
FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, :523-534
[13]  
Dai ZH, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2978
[14]  
Deng L, 2013, INT CONF ACOUST SPEE, P8599, DOI 10.1109/ICASSP.2013.6639344
[15]  
Devlin J., 2018, P C N AM CHAPT ASS C, P1
[16]  
Dong DX, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P1723
[17]   Multi-task Learning based Pre-trained Language Model for Code Completion [J].
Liu, Fang ;
Li, Ge ;
Zhao, Yunfei ;
Jin, Zhi .
2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, :473-485
[18]  
Feng Zhangyin, 2020, CodeBERT: A Pre-Trained Model for Programming and Natural Languages
[19]  
Gage Philip, 1994, C Users Journal, V12, P23, DOI DOI 10.5555/177910.177914
[20]  
Guo H, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P687