Fine-Grained Code Clone Detection with Block-Based Splitting of Abstract Syntax Tree

被引:9
作者
Hu, Tiancheng [1 ,2 ]
Xu, Zijing [1 ,2 ]
Fang, Yilin [1 ,2 ]
Wu, Yueming [3 ]
Yuan, Bin [1 ,2 ]
Zou, Deqing [1 ,2 ]
Jin, Hai [2 ,4 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Hubei Key Lab Distributed Syst Secur, Hubei Engn Res Ctr Big Data Secur,Cluster & Grid, Natl Engn Res Ctr Big Data Technol & Syst,Serv Co, Wuhan, Peoples R China
[3] Nanyang Technol Univ, Singapore, Singapore
[4] Huazhong Univ Sci & Technol, Sch Comp Sci, Wuhan 430074, Peoples R China
来源
PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023 | 2023年
关键词
Clone Detection; Abstract Syntax Tree; Fine-grained; Splitting; GRAPH;
D O I
10.1145/3597926.3598040
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code clone detection aims to find similar code fragments and gains increasing importance in the field of software engineering. There are several types of techniques for detecting code clones. Text-based or token-based code clone detectors are scalable and efficient but lack consideration of syntax, thus resulting in poor performance in detecting syntactic code clones. Although some tree-based methods have been proposed to detect syntactic or semantic code clones with decent performance, they are mostly time-consuming and lack scalability. In addition, these detection methods can not realize fine-grained code clone detection. They are unable to distinguish the concrete code blocks that are cloned. In this paper, we design Tamer, a scalable and fine-grained tree-based syntactic code clone detector. Specifically, we propose a novel method to transform the complex abstract syntax tree into simple subtrees. It can accelerate the process of detection and implement the fine-grained analysis of clone pairs to locate the concrete clone parts of the code. To examine the detection performance and scalability of Tamer, we evaluate it on a widely used dataset BigCloneBench. Experimental results show that Tamer outperforms ten state-of-the-art code clone detection tools (i.e., CCAligner, SourcererCC, Siamese, NIL, NiCad, LVMapper, Deckard, Yang2018, CCFinder, and CloneWorks).
引用
收藏
页码:89 / 100
页数:12
相关论文
共 52 条
  • [1] Cloning by accident: An empirical study of source code cloning across software systems
    Al-Ekram, R
    Kapser, C
    Holt, R
    Godfrey, M
    [J]. 2005 INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING (ISESE), PROCEEDINGS, 2005, : 363 - 372
  • [2] [Anonymous], 2022, Fossid
  • [3] [Anonymous], 2022, Blackducks
  • [4] [Anonymous], 2022, Google Code Jam
  • [5] [Anonymous], 2022, Scantist
  • [6] [Anonymous], 2022, CLOC: Count lines of code
  • [7] [Anonymous], 2022, Pycparser
  • [8] [Anonymous], 2022, Ambient Software Evolution Group: IJaDataset 2.0
  • [9] [Anonymous], 2023, Tamer
  • [10] [Anonymous], 2022, BigCloneBench