CRaDLe: Deep code retrieval based on semantic Dependency Learning

被引:31
作者
Gu, Wenchao [1 ]
Li, Zongjie [2 ]
Gao, Cuiyun [2 ]
Wang, Chaozheng [2 ]
Zhang, Hongyu [3 ]
Xu, Zenglin [2 ]
Lyu, Michael R. [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen, Peoples R China
[3] Univ Newcastle, Newcastle, NSW, Australia
基金
中国国家自然科学基金; 澳大利亚研究理事会;
关键词
Code retrieval; Semantic dependency; Dependency learning; Neural network; GRAPH;
D O I
10.1016/j.neunet.2021.04.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code retrieval is a common practice for programmers to reuse existing code snippets in the open source repositories. Given a user query (i.e., a natural language description), code retrieval aims at searching the most relevant ones from a set of code snippets. The main challenge of effective code retrieval lies in mitigating the semantic gap between natural language descriptions and code snippets. With the ever-increasing amount of available open-source code, recent studies resort to neural networks to learn the semantic matching relationships between the two sources. The statement-level dependency information, which highlights the dependency relations among the program statements during the execution, reflects the structural importance of one statement in the code, which is favorable for accurately capturing the code semantics but has never been explored for the code retrieval task. In this paper, we propose CRaDLe, a novel approach for Code Retrieval based on statement-level semantic Dependency Learning. Specifically, CRaDLe distills code representations through fusing both the dependency and semantic information at the statement level, and then learns a unified vector representation for each code and description pair for modeling the matching relationship. Comprehensive experiments and analysis on real-world datasets show that the proposed approach can accurately retrieve code snippets for a given query and significantly outperform the state-of-the-art approaches on the task. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页码:385 / 394
页数:10
相关论文
共 38 条
[1]  
Akbar Shayan, 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), P1, DOI 10.1109/MSR.2019.00012
[2]  
Alon U., 2019, 7 INT C LEARN REPR O
[3]  
[Anonymous], 2018, 6 INT C LEARNING REP
[4]  
Brandt J, 2009, CHI2009: PROCEEDINGS OF THE 27TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, VOLS 1-4, P1589
[5]   When Deep Learning Met Code Search [J].
Cambronero, Jose ;
Li, Hongyu ;
Kim, Seohyun ;
Sen, Koushik ;
Chandra, Satish .
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, :964-974
[6]  
Chan WK., 2012, P ACM SIGSOFT 20 INT, P1, DOI DOI 10.1145/2393596.2393606
[7]  
Cho K., 2014, P SSST8 8 WORKSH SYN
[8]   Application of spreading activation techniques in information retrieval [J].
Crestani, F .
ARTIFICIAL INTELLIGENCE REVIEW, 1997, 11 (06) :453-482
[9]   THE PROGRAM DEPENDENCE GRAPH AND ITS USE IN OPTIMIZATION [J].
FERRANTE, J ;
OTTENSTEIN, KJ ;
WARREN, JD .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1987, 9 (03) :319-349
[10]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1