Cross-Domain Deep Code Search with Meta Learning

被引：0

作者：

Chai, Yitian ^{[1
]}

Zhang, Hongyu ^{[2
]}

Shen, Beijun ^{[1
]}

Gu, Xiaodong ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Software, Shanghai, Peoples R China

[2] Univ Newcastle, Newcastle, NSW, Australia

来源：

2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022) | 2022年

基金：

中国国家自然科学基金;

关键词：

Code Search; Pre-trained Code Models; Meta Learning; Few-Shot Learning; Deep Learning;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Recently, pre-trained programming language models such as CodeBERT have demonstrated substantial gains in code search. Despite their success, they rely on the availability of large amounts of parallel data to fine-tune the semantic mappings between queries and code. This restricts their practicality in domain-specific languages with relatively scarce and expensive data. In this paper, we propose CDCS, a novel approach for domain-specific code search. CDCS employs a transfer learning framework where an initial program representation model is pre-trained on a large corpus of common programming languages (such as Java and Python), and is further adapted to domain-specific languages such as Solidity and SQL. Unlike cross-language CodeBERT, which is directly fine-tuned in the target language, CDCS adapts a few-shot meta-learning algorithm called MAML to learn the good initialization of model parameters, which can be best reused in a domain-specific language. We evaluate the proposed approach on two domain-specific languages, namely Solidity and SQL, with model transferred from two widely used languages (Python and Java). Experimental results show that CDCS significantly outperforms conventional pre-trained code models that are directly fine-tuned in domain-specific languages, and it is particularly effective for scarce data.

引用

页码：487 / 498

页数：12

共 45 条

[1] Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655
[2] Bajracharya S. K., 2006, P OOPSLA C, P681, DOI 10.1145/1176617.1176671
[3] Brown TB, 2020, ADV NEUR IN, V33
[4] When Deep Learning Met Code Search
Cambronero, Jose
Li, Hongyu
Kim, Seohyun
Sen, Koushik
Chandra, Satish
[J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 964 - 974
[5] Casalnuovo C, 2018, Arxiv, DOI arXiv:1806.02437
[6] Learning a similarity metric discriminatively, with application to face verification
Chopra, S
Hadsell, R
LeCun, Y
[J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 539 - 546
[7] A FRAMEWORK FOR SOURCE CODE SEARCH USING PROGRAM PATTERNS
DEVANBU, P
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1995, 21 (12) : 1009 - 1010
[8] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[9] Feng Z., 2020, Codebert: A PreTrained Model for Programming and Natural Languages, P1536
[10] Finn C, 2017, PR MACH LEARN RES, V70

← 1 2 3 4 5 →