How are functionally similar code clones syntactically different? An empirical study and a benchmark

被引:12
作者
Wagner, Stefan [1 ]
Abdulkhaleq, Asim [1 ]
Bogicevic, Ivan [1 ]
Ostberg, Jan-Peter [1 ]
Ramadani, Jasmin [1 ]
机构
[1] Univ Stuttgart, Inst Software Technol, Stuttgart, Germany
来源
PEERJ COMPUTER SCIENCE | 2016年
关键词
Code clone; Functionally similar clone; Empirical study; Benchmark;
D O I
10.7717/peerj-cs.49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Background. Today, redundancy in source code, so-called "clones'' caused by copy &paste can be found reliably using clone detection tools. Redundancy can arise also independently, however, not caused by copy&paste. At present, it is not clear how only functionally similar clones (FSC) differ from clones created by copy&paste. Our aim is to understand and categorise the syntactical differences in FSCs that distinguish them from copy&paste clones in a way that helps clone detection research. Methods. We conducted an experiment using known functionally similar programs in Java and C from coding contests. We analysed syntactic similarity with traditional detection tools and explored whether concolic clone detection can go beyond syntax. We ran all tools on 2,800 programs and manually categorised the differences in a random sample of 70 program pairs. Results. We found no FSCs where complete files were syntactically similar. We could detect a syntactic similarity in a part of the files in <16% of the program pairs. Concolic detection found 1 of the FSCs. The differences between program pairs were in the categories algorithm, data structure, OO design, I/O and libraries. We selected 58 pairs for an openly accessible benchmark representing these categories. Discussion. The majority of differences between functionally similar clones are beyond the capabilities of current clone detection approaches. Yet, our benchmark can help to drive further clone detection research.
引用
收藏
页数:26
相关论文
共 31 条
  • [1] Comparison and evaluation of clone detection tools
    Bellon, Stefan
    Koschke, Rainer
    Antoniol, Giuliano
    Krinke, Jens
    Merlo, Ettore
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (09) : 577 - 591
  • [2] Challenges of the Dynamic Detection of Functionally Similar Code Fragments
    Deissenboeck, Florian
    Heinemann, Lars
    Hummel, Benjamin
    Wagner, Stefan
    [J]. 2012 16TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING (CSMR), 2012, : 297 - +
  • [3] Tool support for continuous quality control
    Deissenboeck, Florian
    Juergens, Elmer
    Hummel, Benjamin
    Wagner, Stefan
    Mas y Parareda, Bonedikt
    Pizka, Markus
    [J]. IEEE SOFTWARE, 2008, 25 (05) : 60 - 67
  • [4] Gabel M, 2008, ICSE'08 PROCEEDINGS OF THE THIRTIETH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, P321, DOI 10.1145/1368088.1368132
  • [5] Higo Y, 2011, P 18 WORK C REV ENG
  • [6] Jedlitschka A, 2005, P 4 INT S EMP SOFTW
  • [7] Jiang L., 2007, P THE 6 JOINT M EURO, P55, DOI DOI 10.1145/1287624.1287634
  • [8] Jiang LX, 2007, PROC INT CONF SOFTW, P96
  • [9] Jiang LX, 2009, ISSTA 2009: INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, P81
  • [10] Code Similarities Beyond Copy & Paste
    Juergens, Elmar
    Deissenboeck, Florian
    Hummel, Benjamin
    [J]. 14TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING (CSMR 2010), 2010, : 78 - 87