A mutual embedded self-attention network model for code search?

被引:3
作者
Hu, Haize [1 ]
Liu, Jianxun [1 ]
Zhang, Xiangping
Cao, Ben
Cheng, Siqiang
Long, Teng
机构
[1] North Second Ring Rd, Xiangtan, Hunan, Peoples R China
关键词
Code search; Code segments; Machine learning; Self-attention; MESN-CS;
D O I
10.1016/j.jss.2022.111591
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
To improve the efficiency of program implementation, developers can selectively reuse the previously written code by searching the open-source codebase. To date, many code search methods have been proposed to actively push the limit of code search accuracy, where the methods designed using SelfAttention mechanism are particularly promising. However, while existing methods can improve the efficiency to capture textual semantics by attending significant words in the code component unit, they typically fail to capture the structural dependencies between the code components which may produce suboptimal search accuracy. In this paper, we propose a novel Self-Attention model termed MESN-CS which considers both word-level attention and code unit-level attention for code search. MESN-CS not only the attention weight of each word in the code component unit is calculated, but also the weight of the embedding between the code combination units is calculated. To verify the effectiveness of the proposed model, three benchmark models were compared on a large-scale code data and CodesearchNet. The experimental results show that the MESN-CS has better Recall@k, NDCG and MRR performance than baseline methods. the experiments also show that the semantic syntactic information between sequences can be effectively characterized in MESN-CS. (c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页数:15
相关论文
共 62 条
  • [1] Bespalov Dmitriy, 2012, Machine Learning and Knowledge Discovery in Databases. Proceedings of the European Conference (ECML PKDD 2012), P159, DOI 10.1007/978-3-642-33460-3_16
  • [2] When Deep Learning Met Code Search
    Cambronero, Jose
    Li, Hongyu
    Kim, Seohyun
    Sen, Koushik
    Chandra, Satish
    [J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 964 - 974
  • [3] Mining Analogical Libraries in Q&A Discussions - Incorporating Relational and Categorical Knowledge into Word Embedding
    Chen, Chunyang
    Gao, Sa
    Xing, Zhenchang
    [J]. 2016 IEEE 23RD INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), VOL 1, 2016, : 338 - 348
  • [4] Advanced text documents information retrieval system for search services
    Chiranjeevi, H. S.
    Shenoy, Manjula K.
    [J]. COGENT ENGINEERING, 2020, 7 (01):
  • [5] Emerging Trends Word2Vec
    Church, Kenneth Ward
    [J]. NATURAL LANGUAGE ENGINEERING, 2017, 23 (01) : 155 - 162
  • [6] Drain D, 2021, Arxiv, DOI arXiv:2104.05310
  • [7] Self-Attention Networks for Code Search
    Fang, Sen
    Tan, You-Shuai
    Zhang, Tao
    Liu, Yepang
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 134
  • [8] CSDA: a novel cluster-based secure data aggregation scheme for WSNs
    Fang, Wei
    Wen, XueZhi
    Xu, Jiang
    Zhu, JieZhong
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S5233 - S5244
  • [9] Feng C., 2020 27 ASI PAC SOFT, P238
  • [10] Feng ZY, 2020, Arxiv, DOI [arXiv:2002.08155, DOI 10.48550/ARXIV.2002.08155, 10.48550/arXiv.2002.08155]