Semantic Clone Detection Based on Code Feature Fusion Learning

被引:1
作者
Zhang, Qianjin [1 ,2 ]
Jin, Dahai [1 ,2 ]
Wang, Yawen [2 ]
Gong, Yunzhan [2 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
[2] Guangxi Key Lab Cryptog & Informat Secur, Guilin 541004, Guangxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Code clone detection; code representation learning; code semantic understanding; graph neural network;
D O I
10.1142/S0218194023500249
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code clones are duplicated code snippets that significantly threaten software maintenance and the public corpora of code representation learning. Traditionally, code context and its structure information abstract syntax tree (AST), control flow graph (CFG) are typical representations of source code, and context-based models and structure-based models contributed significantly to the development of code clone detection. In this paper, we present a hybrid embedding model for code clone detection (HEM-CCD), a fusion method of token sequential information and graph-based structure information. We insert tokens' global context information encoded by a bi-directional recurrent neural network into the AST-based graph for comprehensive code semantic representation. Then, feeding the graph into a gated graph neural network we generate code semantic vectors for similarity evaluation. We have implemented our model on two public clone datasets (BigCloneBench and GoogleCodeJam), and the results indicate that HEM-CCD outperforms several state-of-the-art approaches.
引用
收藏
页码:1039 / 1062
页数:24
相关论文
共 42 条
[31]  
Vasic M., ARXIV
[32]   CCSharp: An Efficient Three-phase Code Clone Detector Using Modified PDGs [J].
Wang, Min ;
Wang, Pengcheng ;
Xu, Yun .
2017 24TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2017), 2017, :100-109
[33]  
Wang WH, 2020, PROCEEDINGS OF THE 2020 IEEE 27TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER '20), P261, DOI [10.1109/saner48275.2020.9054857, 10.1109/SANER48275.2020.9054857]
[34]  
Wei HH, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P3034
[35]   Deep Learning Code Fragments for Code Clone Detection [J].
White, Martin ;
Tufano, Michele ;
Vendome, Christopher ;
Poshyvanyk, Denys .
2016 31ST IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2016, :87-98
[36]   SCDetector: Software Functional Clone Detection Based on Semantic Tokens Analysis [J].
Wu, Yueming ;
Zou, Deqing ;
Dou, Shihan ;
Yang, Siru ;
Yang, Wei ;
Cheng, Feng ;
Liang, Hong ;
Jin, Hai .
2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, :821-833
[37]   SEED: Semantic Graph Based Deep Detection for Type-4 Clone [J].
Xue, Zhipeng ;
Jiang, Zhijie ;
Huang, Chenlin ;
Xu, Rulin ;
Huang, Xiangbing ;
Hu, Liumin .
REUSE AND SOFTWARE QUALITY (ICSR 2022), 2022, 13297 :120-137
[38]   Java']Java Code Clone Detection by Exploiting Semantic and Syntax Information From Intermediate Code-Based Graph [J].
Yuan, Dawei ;
Fang, Sen ;
Zhang, Tao ;
Xu, Zhou ;
Luo, Xiapu .
IEEE TRANSACTIONS ON RELIABILITY, 2023, 72 (02) :511-526
[39]   A Novel Neural Source Code Representation Based on Abstract Syntax Tree [J].
Zhang, Jian ;
Wang, Xu ;
Zhang, Hongyu ;
Sun, Hailong ;
Wang, Kaixuan ;
Liu, Xudong .
2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2019), 2019, :783-794
[40]   DeepSim: Deep Learning Code Functional Similarity [J].
Zhao, Gang ;
Huang, Jeff .
ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, :141-151