You Look so Different: Finding Structural Clones and Subclones in Java']Java Source Code

被引:9
作者
Amme, Wolfram [1 ]
Heinze, Thomas S. [2 ]
Schafer, Andre [1 ]
机构
[1] Friedrich Schiller Univ Jena, Inst Comp Sci, Jena, Germany
[2] German Aerosp Ctr DLR, Inst Data Sci, Jena, Germany
来源
2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2021) | 2021年
关键词
D O I
10.1109/ICSME52107.2021.00013
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code reuse and copying is a widespread practice in software development. Detecting code clones, i.e., identical or similar fragments of code, is thus an important task with many applications, ranging from code search to bug finding and malware detection. In this paper, we propose a new approach to detect code clones in source code. Instead of analyzing the code tokens or syntax, our technique is based upon control flow analysis and dominator trees. In this way, the technique not only detects exact and syntactically similar near-miss code clones but also two new types of clones, which we characterize as structural code clones and subclones. For implementation and evaluation, we have developed the tool StoneDetector, which finds code clones in Java source code. StoneDetector performs competitive with the state of the art as measured on the BigCloneBench benchmark and finds more structural clones and subclones.
引用
收藏
页码:70 / 80
页数:11
相关论文
共 52 条
[1]  
Aho A. V., 1986, COMPILERS PRINCIPLES
[2]   code2vec: Learning Distributed Representations of Code [J].
Alon, Uri ;
Zilberstein, Meital ;
Levy, Omer ;
Yahav, Eran .
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL)
[3]  
Ambient Software Evolution Group, 2013, IJADATASET 2 0
[4]   Clone detection using abstract syntax trees [J].
Baxter, ID ;
Yahin, A ;
Moura, L ;
Sant'Anna, M ;
Bier, L .
INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS, 1998, :368-377
[5]   A survey of longest common subsequence algorithms [J].
Bergroth, L ;
Hakonen, H ;
Raita, T .
SPIRE 2000: SEVENTH INTERNATIONAL SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL - PROCEEDINGS, 2000, :39-48
[6]  
Büch L, 2019, 2019 IEEE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER), P95, DOI [10.1109/SANER.2019.8668039, 10.1109/saner.2019.8668039]
[7]   The MalSource Dataset: Quantifying Complexity and Code Reuse in Malware Development [J].
Calleja, Alejandro ;
Tapiador, Juan ;
Caballero, Juan .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2019, 14 (12) :3175-3190
[8]   Detecting Android Malware Using Clone Detection [J].
Chen, Jian ;
Alalfi, Manar H. ;
Dean, Thomas R. ;
Zou, Ying .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (05) :942-956
[9]   The NiCad Clone Detector [J].
Cordy, James R. ;
Roy, Chanchal K. .
2011 IEEE 19TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2011, :219-+
[10]  
Fang C, 2020, P 29 ACM SIGSOFT INT, P516