A Source Code Similarity System for Plagiarism Detection

被引:51
作者
Duric, Zoran [1 ]
Gasevic, Dragan [2 ]
机构
[1] Univ Banja Luka, Fac Elect Engn, Banja Luka 78000, Bosnia & Herceg
[2] Athabasca Univ, Sch Comp & Informat Syst, Athabasca, AB, Canada
关键词
algorithms; plagiarism; similarity detection; software; source code;
D O I
10.1093/comjnl/bxs018
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Source code plagiarism is an easy to do task, but very difficult to detect without proper tool support. Various source code similarity detection systems have been developed to help detect source code plagiarism. Those systems need to recognize a number of lexical and structural source code modifications. For example, by some structural modifications (e.g. modification of control structures, modification of data structures or structural redesign of source code) the source code can be changed in such a way that it almost looks genuine. Most of the existing source code similarity detection systems can be confused when these structural modifications have been applied to the original source code. To be considered effective, a source code similarity detection system must address these issues. To address them, we designed and developed the source code similarity system for plagiarism detection. To demonstrate that the proposed system has the desired effectiveness, we performed a well-known conformism test. The proposed system showed promising results as compared with the JPlag system in detecting source code similarity when various lexical or structural modifications are applied to plagiarized code. As a confirmation of these results, an independent samples t-test revealed that there was a statistically significant difference between average values of F-measures for the test sets that we used and for the experiments that we have done in the practically usable range of cut-off threshold values of 35-70%.
引用
收藏
页码:70 / 86
页数:17
相关论文
共 40 条
[1]  
Ahtiainen A., 2006, P 6 BALT SEA C COMP, P141, DOI [10.1145/1315803.1315831, DOI 10.1145/1315803.1315831]
[2]  
[Anonymous], 2006, 29 AUSTRALASIAN COMP
[3]  
Bailey C. T., 1981, Performance Evaluation Review, V10, P189, DOI 10.1145/1010627.807928
[4]   Language-Independent Clone Detection Applied to Plagiarism Detection [J].
Brixtel, Romain ;
Fontaine, Mathieu ;
Lesner, Boris ;
Bazin, Cyril ;
Robbes, Romain .
2010 10TH IEEE INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION, 2010, :77-86
[5]   Efficient plagiarism detection for large code repositories [J].
Burrows, Steven ;
Tahaghoghi, S. M. M. ;
Zobel, Justin .
SOFTWARE-PRACTICE & EXPERIENCE, 2007, 37 (02) :151-175
[6]   Shared information and program plagiarism detection [J].
Chen, X ;
Francia, B ;
Li, M ;
McKinnon, B ;
Seker, A .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (07) :1545-1551
[7]  
Chen X., 2002, TECHNICAL REPORT
[8]  
Clough P, 2000, PLAGIARISM NATURAL P
[9]  
Cosma G., 2008, THESIS U WARWICK
[10]  
Donaldson J. L., 1981, SIGCSE Bulletin, V13, P21, DOI 10.1145/953049.800955