Towards a Big Data Curated Benchmark of Inter-Project Code Clones

被引:204
作者
Svajlenko, Jeffrey [1 ]
Islam, Judith F. [1 ]
Keivanloo, Iman [2 ]
Roy, Chanchal K. [1 ]
Mia, Mohammad Mamun [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK S7N 0W0, Canada
[2] Queens Univ, Dept Elect & Comp Engn, Kingston, ON K7L 3N6, Canada
来源
2014 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME) | 2014年
关键词
D O I
10.1109/ICSME.2014.77
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recently, new applications of code clone detection and search have emerged that rely upon clones detected across thousands of software systems. Big data clone detection and search algorithms have been proposed as an embedded part of these new applications. However, there exists no previous benchmark data for evaluating the recall and precision of these emerging techniques. In this paper, we present a big data clone detection benchmark that consists of known true and false positive clones in a big data inter-project Java repository. The benchmark was built by mining and then manually checking clones of ten common functionalities. The benchmark contains six million true positive clones of different clone types: Type-1, Type-2, Type-3 and Type-4, including various strengths of Type-3 similarity (strong, moderate, weak). These clones were found by three judges over 216 hours of manual validation efforts. We show how the benchmark can be used to measure the recall and precision of clone detection techniques.
引用
收藏
页码:476 / 480
页数:5
相关论文
共 13 条
[1]  
[Anonymous], 2013, IJADATASET 2 0
[2]   Finding clones with dup: Analysis of an experiment [J].
Baker, Brenda S. .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (09) :608-621
[3]   Comparison and evaluation of clone detection tools [J].
Bellon, Stefan ;
Koschke, Rainer ;
Antoniol, Giuliano ;
Krinke, Jens ;
Merlo, Ettore .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (09) :577-591
[4]  
CHEN K, 2014, ICSE, P175
[5]  
Keivanloo I., 2014, ICSE
[6]   Large-Scale Inter-System Clone Detection Using Suffix Trees [J].
Koschke, Rainer .
2012 16TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING (CSMR), 2012, :309-318
[7]  
Krutz Daniel E., 2014, MSR 2014, P388
[8]  
Park J.-w., 2013, KNOWL INF SYST, P1
[9]   Comparison and evaluation of code clone detection techniques and tools: A qualitative approach [J].
Roy, Chanchal K. ;
Cordy, James R. ;
Koschke, Rainer .
SCIENCE OF COMPUTER PROGRAMMING, 2009, 74 (07) :470-495
[10]  
Sajnani H, 2013, INT WORKS SOFTW CLON, P46, DOI 10.1109/IWSC.2013.6613042