A unified multi-task learning model for AST-level and token-level code completion

被引:19
作者
Liu, Fang [1 ,2 ]
Li, Ge [1 ,2 ]
Wei, Bolin [1 ,2 ]
Xia, Xin [3 ]
Fu, Zhiyi [1 ,2 ]
Jin, Zhi [1 ,2 ]
机构
[1] Peking Univ, Minist Educ, Key Lab High Confidence Software Technol, Beijing, Peoples R China
[2] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
[3] Huawei, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Code completion; Deep learning; Multi-task learning;
D O I
10.1007/s10664-022-10140-7
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code completion, one of the most useful features in the Integrated Development Environments (IDEs), can accelerate software development by suggesting the next probable tokens based on existing code in real-time. Recent studies have shown that recurrent neural networks based statistical language models can improve the performance of code completion tools through learning from large-scale software repositories. However, most of the existing approaches treat code completion as a single generation task in which the model predicts the value of the tokens or AST nodes based on the contextual source code without considering the syntactic constraints such as the static type information. Besides, the semantic relationships in programs can be very long. Existing recurrent neural networks based language models are not sufficient to model the long-term dependency. In this paper, we tackle the aforementioned limitations by building a unified multi-task learning based code completion model for both AST-level and token-level code completion. To model the relationship and constraints between the type and value of the code elements, we adopt a multi-task learning framework to predict the type and value of the tokens (AST nodes) simultaneously. To capture the long-term dependency in the input programs, we employ a self-attentional architecture based network as the base language model. We apply our approach to both AST-level and token-level code completion. Experimental results demonstrate the effectiveness of our model when compared with state-of-the-art methods.
引用
收藏
页数:38
相关论文
共 57 条
[41]  
Maddison CJ, 2014, PR MACH LEARN RES, V32, P649
[42]   NL2Type: Inferring Java']JavaScript Function Types from Natural Language Information [J].
Malik, Rabee Sohail ;
Patra, Jibesh ;
Pradel, Michael .
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, :304-315
[43]  
Nguyen Tung Thanh, P 2013 9 JOINT M FDN, P532, DOI DOI 10.1145/2491411.2491458
[44]  
Peng Nanyun., 2017, P 2 WORKSHOP REPRESE, P91, DOI 10.18653/v1/W17-2612
[45]   Probabilistic Model for Code with Decision Trees [J].
Raychev, Veselin ;
Bielik, Pavol ;
Vechev, Martin .
ACM SIGPLAN NOTICES, 2016, 51 (10) :731-747
[46]  
Robbes Romain, 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering, P317, DOI 10.1109/ASE.2008.42
[47]  
Ruder Sebastian, 2017, CoRR
[48]   Bidirectional recurrent neural networks [J].
Schuster, M ;
Paliwal, KK .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (11) :2673-2681
[49]  
Svyatkovskiy A, 2019, KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P2727, DOI 10.1145/3292500.3330699
[50]  
Svyatkovskoy A., 2020, ARXIV200413651