Cross-Domain Deep Code Search with Meta Learning

被引:0
作者
Chai, Yitian [1 ]
Zhang, Hongyu [2 ]
Shen, Beijun [1 ]
Gu, Xiaodong [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Software, Shanghai, Peoples R China
[2] Univ Newcastle, Newcastle, NSW, Australia
来源
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022) | 2022年
基金
中国国家自然科学基金;
关键词
Code Search; Pre-trained Code Models; Meta Learning; Few-Shot Learning; Deep Learning;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recently, pre-trained programming language models such as CodeBERT have demonstrated substantial gains in code search. Despite their success, they rely on the availability of large amounts of parallel data to fine-tune the semantic mappings between queries and code. This restricts their practicality in domain-specific languages with relatively scarce and expensive data. In this paper, we propose CDCS, a novel approach for domain-specific code search. CDCS employs a transfer learning framework where an initial program representation model is pre-trained on a large corpus of common programming languages (such as Java and Python), and is further adapted to domain-specific languages such as Solidity and SQL. Unlike cross-language CodeBERT, which is directly fine-tuned in the target language, CDCS adapts a few-shot meta-learning algorithm called MAML to learn the good initialization of model parameters, which can be best reused in a domain-specific language. We evaluate the proposed approach on two domain-specific languages, namely Solidity and SQL, with model transferred from two widely used languages (Python and Java). Experimental results show that CDCS significantly outperforms conventional pre-trained code models that are directly fine-tuned in domain-specific languages, and it is particularly effective for scarce data.
引用
收藏
页码:487 / 498
页数:12
相关论文
共 45 条
  • [1] Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655
  • [2] Bajracharya S. K., 2006, P OOPSLA C, P681, DOI 10.1145/1176617.1176671
  • [3] Brown TB, 2020, ADV NEUR IN, V33
  • [4] When Deep Learning Met Code Search
    Cambronero, Jose
    Li, Hongyu
    Kim, Seohyun
    Sen, Koushik
    Chandra, Satish
    [J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 964 - 974
  • [5] Casalnuovo C, 2018, Arxiv, DOI arXiv:1806.02437
  • [6] Learning a similarity metric discriminatively, with application to face verification
    Chopra, S
    Hadsell, R
    LeCun, Y
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 539 - 546
  • [7] A FRAMEWORK FOR SOURCE CODE SEARCH USING PROGRAM PATTERNS
    DEVANBU, P
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1995, 21 (12) : 1009 - 1010
  • [8] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
  • [9] Feng Z., 2020, Codebert: A PreTrained Model for Programming and Natural Languages, P1536
  • [10] Finn C, 2017, PR MACH LEARN RES, V70