An AST-Based Code Plagiarism Detection Algorithm

被引:25
作者
Zhao, Jingling [1 ,2 ]
Xia, Kunfeng [1 ,2 ]
Fu, Yilun [3 ]
Cui, Baojiang [1 ,2 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Beijing, Peoples R China
[2] Natl Engn Lab Mobile Network Secur, Beijing, Peoples R China
[3] China Elect Power Res Inst, Beijing, Peoples R China
来源
2015 10TH INTERNATIONAL CONFERENCE ON BROADBAND AND WIRELESS COMPUTING, COMMUNICATION AND APPLICATIONS (BWCCA 2015) | 2015年
关键词
plagiarism detection; tree-based technology; code comparison; abstract syntax tree; hash value;
D O I
10.1109/BWCCA.2015.52
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In modern software engineering, software plagiarism is widespread and uncurbed, developing plagiarism detection methods is imperative. Popular technologies of software plagiarism detection are mostly based on text, token and syntax tree. Among these plagiarism detection technologies, tree-based plagiarism detection technology can effectively detect the code which cannot be detected by the other two kinds of technologies. In this paper, we propose a more effective plagiarism detection algorithm based on abstract syntax tree (AST) by computing the hash values of the syntax tree nodes, and comparing them. In order to implement the algorithm more effectively, special measurement is taken to reduce the error rate when calculating the hash values of operations, especially the arithmetic operations like subtraction and division. Results of the test showed that the measurement is reliable and necessary. It performs well in the code comparison field, and is helpful in the area of protecting source code's copyright.
引用
收藏
页码:178 / 182
页数:5
相关论文
共 17 条
[1]  
Arabyarmohamady S., INT MOB COMP AID LEA, P180
[2]   Clone detection using abstract syntax trees [J].
Baxter, ID ;
Yahin, A ;
Moura, L ;
Sant'Anna, M ;
Bier, L .
INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS, 1998, :368-377
[3]  
Butakov Sergey, INF SCI APPL ICISA 2, P1
[4]   Heap Graph Based Software Theft Detection [J].
Chan, Patrick P. F. ;
Hui, Lucas C. K. ;
Yiu, S. M. .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2013, 8 (01) :101-110
[5]  
Chuda D., 2012, IEEE T ED, V55
[6]   An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis [J].
Cosma, Georgina ;
Joy, Mike .
IEEE TRANSACTIONS ON COMPUTERS, 2012, 61 (03) :379-394
[7]  
Gipp Bela, 2011, JUN 2011 P 11 ANN IN
[8]   Plagiarism in programming assignments [J].
Joy, M ;
Luck, M .
IEEE TRANSACTIONS ON EDUCATION, 1999, 42 (02) :129-133
[9]   CCFinder: A multilinguistic token-based code clone detection system for large scale source code [J].
Kamiya, T ;
Kusumoto, S ;
Inoue, K .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) :654-670
[10]  
Komondoor R, 2001, LECT NOTES COMPUT SC, V2126, P40