Fine-Grained Code Clone Detection with Block-Based Splitting of Abstract Syntax Tree

被引:9
作者
Hu, Tiancheng [1 ,2 ]
Xu, Zijing [1 ,2 ]
Fang, Yilin [1 ,2 ]
Wu, Yueming [3 ]
Yuan, Bin [1 ,2 ]
Zou, Deqing [1 ,2 ]
Jin, Hai [2 ,4 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Hubei Key Lab Distributed Syst Secur, Hubei Engn Res Ctr Big Data Secur,Cluster & Grid, Natl Engn Res Ctr Big Data Technol & Syst,Serv Co, Wuhan, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
[4] Huazhong Univ Sci & Technol, Sch Comp Sci, Wuhan 430074, Peoples R China
来源
PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023 | 2023年
关键词
Clone Detection; Abstract Syntax Tree; Fine-grained; Splitting; GRAPH;
D O I
10.1145/3597926.3598040
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code clone detection aims to find similar code fragments and gains increasing importance in the field of software engineering. There are several types of techniques for detecting code clones. Text-based or token-based code clone detectors are scalable and efficient but lack consideration of syntax, thus resulting in poor performance in detecting syntactic code clones. Although some tree-based methods have been proposed to detect syntactic or semantic code clones with decent performance, they are mostly time-consuming and lack scalability. In addition, these detection methods can not realize fine-grained code clone detection. They are unable to distinguish the concrete code blocks that are cloned. In this paper, we design Tamer, a scalable and fine-grained tree-based syntactic code clone detector. Specifically, we propose a novel method to transform the complex abstract syntax tree into simple subtrees. It can accelerate the process of detection and implement the fine-grained analysis of clone pairs to locate the concrete clone parts of the code. To examine the detection performance and scalability of Tamer, we evaluate it on a widely used dataset BigCloneBench. Experimental results show that Tamer outperforms ten state-of-the-art code clone detection tools (i.e., CCAligner, SourcererCC, Siamese, NIL, NiCad, LVMapper, Deckard, Yang2018, CCFinder, and CloneWorks).
引用
收藏
页码:89 / 100
页数:12
相关论文
共 52 条
  • [21] Ishihara T., 2012, 2012 19th Working Conference on Reverse Engineering (WCRE), P387, DOI 10.1109/WCRE.2012.48
  • [22] Jia Yue, 2009, P 3 INT WORKSH DET S
  • [23] Jiang LX, 2007, PROC INT CONF SOFTW, P96
  • [24] Two-Pass Technique for Clone Detection and Type Classification Using Tree-Based Convolution Neural Network
    Jo, Young-Bin
    Lee, Jihyun
    Yoo, Cheol-Jung
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [25] CCFinder: A multilinguistic token-based code clone detection system for large scale source code
    Kamiya, T
    Kusumoto, S
    Inoue, K
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) : 654 - 670
  • [26] Kamiya T., 2021, Code Clone Analysis, P31
  • [27] Komondoor R, 2001, LECT NOTES COMPUT SC, V2126, P40
  • [28] KOSCHKE R, 2007, DAGSTUHL SEMINAR P
  • [29] Identifying similar code with program dependence graphs
    Krinke, J
    [J]. EIGHTH WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, 2001, : 301 - 309
  • [30] CCLearner: A Deep Learning-Based Clone Detection Approach
    Li, Liuqing
    Feng, He
    Zhuang, Wenjie
    Meng, Na
    Ryder, Barbara
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2017, : 249 - 259