FMCS: Improving Code Search by Multi-Modal Representation Fusion and Momentum Contrastive Learning

被引:0
作者
Liu, Wenjie [1 ]
Chen, Gong [1 ]
Xie, Xiaoyuan [1 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Hubei, Peoples R China
来源
2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS | 2024年
关键词
code search; contrastive learning; multi-modal models; data augmentation;
D O I
10.1109/QRS62785.2024.00068
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code search is a critical task in software engineering, which is to search relevant codes from the codebase based on the natural language query. Although existing code search methods based on multi-modal contrast learning have achieved advanced performance, these methods still have limitations in the representation learning of multi-modal data and do not sufficiently explore the role of functionally equivalent code pairs in representation learning. To address these limitations, we propose a code search framework based on multi-modal representation fusion and momentum contrastive learning, named FMCS. We effectively retain the semantic and structural information of the code by multi-modal representation fusion. We further learn the correlation between the relevant samples by the momentum contrastive learning between samples. The experimental results on the CodeSearchNet benchmark show the effectiveness of FMCS.
引用
收藏
页码:632 / 638
页数:7
相关论文
共 24 条
[1]   Comparison and evaluation of clone detection tools [J].
Bellon, Stefan ;
Koschke, Rainer ;
Antoniol, Giuliano ;
Krinke, Jens ;
Merlo, Ettore .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (09) :577-591
[2]  
BORSTLER J, 1995, SEKE '95, PROCEEDINGS, P204
[3]  
Chen T, 2020, PR MACH LEARN RES, V119
[4]   Code Search: A Survey of Techniques for Finding Code [J].
Di Grazia, Luca ;
Pradel, Michael .
ACM COMPUTING SURVEYS, 2023, 55 (11)
[5]  
Fang H., 2020, Cert: Contrastive selfsupervised learning for language understanding
[6]   CodeHow: Effective Code Search based on API Understanding and Extended Boolean Model [J].
Lv, Fei ;
Zhang, Hongyu ;
Lou, Jian-guang ;
Wang, Shaowei ;
Zhang, Dongmei ;
Zhao, Jianjun .
2015 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2015, :260-270
[7]  
Feng ZY, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P1536
[8]  
Gao TY, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P6894
[9]   Deep Code Search [J].
Gu, Xiaodong ;
Zhang, Hongyu ;
Kim, Sunghun .
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, :933-944
[10]  
Guo D., 2021, Graphcodebert: Pretraining code representations with data flow