WASTK: A Weighted Abstract Syntax Tree Kernel Method for Source Code Plagiarism Detection

被引:30
作者
Fu, Deqiang [1 ,2 ]
Xu, Yanyan [1 ]
Yu, Haoran [2 ]
Yang, Boyang [2 ]
机构
[1] Beijing Forestry Univ, Sch Informat Sci & Technol, 35 Qinghuadong Rd, Beijing 100083, Peoples R China
[2] Beijing Judao Youda Network Technol Co Ltd, Jisuan Inst Technol, 18 Suzhoujie St,Room 1204, Beijing 100080, Peoples R China
基金
中国国家自然科学基金;
关键词
Trees (mathematics) - Inverse problems - Intellectual property - Text processing - Computer programming languages - Education computing - Syntactics;
D O I
10.1155/2017/7809047
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we introduce a source code plagiarism detection method, named WASTK (Weighted Abstract Syntax Tree Kernel), for computer science education. Different from other plagiarism detection methods, WASTK takes some aspects other than the similarity between programs into account. WASTK firstly transfers the source code of a program to an abstract syntax tree and then gets the similarity by calculating the tree kernel of two abstract syntax trees. To avoid misjudgment caused by trivial code snippets or frameworks given by instructors, an idea similar to TF-IDF (Term Frequency-Inverse Document Frequency) in the field of information retrieval is applied. Each node in an abstract syntax tree is assigned a weight by TF-IDF. WASTK is evaluated on different datasets and, as a result, performs much better than other popular methods like Sim and JPlag.
引用
收藏
页数:8
相关论文
共 20 条
[1]  
[Anonymous], 2010, COMP PLAGIARISM DETE
[2]  
[Anonymous], P 37 ANN FRONT ED C
[3]  
BAKER BS, 1995, SECOND WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, P86, DOI 10.1109/WCRE.1995.514697
[4]  
Belkhouche B., 2004, Proceedings of the 42nd Annual Southeast Regional Conference, P207
[5]  
Beth Bradley, 2014, COMP SIMILARITY TECH
[6]  
Collins M, 2002, ADV NEUR IN, V14, P625
[7]   An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis [J].
Cosma, Georgina ;
Joy, Mike .
IEEE TRANSACTIONS ON COMPUTERS, 2012, 61 (03) :379-394
[8]  
Deokate B., 2016, J MULTIDISCIPLINARY, V3, P3747
[9]   Internet plagiarism in higher education: tendencies, triggering factors and reasons among teacher candidates [J].
Eret, Esra ;
Ok, Ahmet .
ASSESSMENT & EVALUATION IN HIGHER EDUCATION, 2014, 39 (08) :1002-1016
[10]  
Gitchell D., 1999, SIGCSE Bulletin, V31, P266, DOI 10.1145/384266.299783