Maintaining Academic Integrity in Programming: Locality-Sensitive Hashing and Recommendations

被引:6
作者
Karnalim, Oscar [1 ]
机构
[1] Maranatha Christian Univ, Fac Informat Technol, Bandung 40164, Indonesia
关键词
programming; plagiarism; collusion; similarity detection; recommendations; higher education; CODE PLAGIARISM DETECTION;
D O I
10.3390/educsci13010054
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Not many efficient similarity detectors are employed in practice to maintain academic integrity. Perhaps it is because they lack intuitive reports for investigation, they only have a command line interface, and/or they are not publicly accessible. This paper presents SSTRANGE, an efficient similarity detector with locality-sensitive hashing (MinHash and Super-Bit). The tool features intuitive reports for investigation and a graphical user interface. Further, it is accessible on GitHub. SSTRANGE was evaluated on the SOCO dataset under two performance metrics: f-score and processing time. The evaluation shows that both MinHash and Super-Bit are more efficient than their predecessors (Cosine and Jaccard with 60% less processing time) and a common similarity measurement (running Karp-Rabin greedy string tiling with 99% less processing time). Further, the effectiveness trade-off is still reasonable (no more than 24%). Higher effectiveness can be obtained by tuning the number of clusters and stages. To encourage the use of automated similarity detectors, we provide ten recommendations for instructors interested in employing such detectors for the first time. These include consideration of assessment design, irregular patterns of similarity, multiple similarity measurements, and effectiveness-efficiency trade-off. The recommendations are based on our 2.5-year experience employing similarity detectors (SSTRANGE's predecessors) in 13 course offerings with various assessment designs.
引用
收藏
页数:23
相关论文
共 59 条
[21]   Uncovering Source Code Reuse in Large-Scale Academic Environments [J].
Flores, Enrique ;
Barron-Cedeno, Alberto ;
Moreno, Lidia ;
Rosso, Paolo .
COMPUTER APPLICATIONS IN ENGINEERING EDUCATION, 2015, 23 (03) :383-390
[22]   Cross-Language Source Code Plagiarism Detection using Explicit Semantic Analysis and Scored Greedy String Tilling [J].
Foltynek, Tomas ;
Vsiansky, Richard ;
Meuschke, Norman ;
Dlabolova, Dita ;
Gipp, Bela .
PROCEEDINGS OF THE ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES IN 2020, JCDL 2020, 2020, :523-524
[23]   Collaboration, Collusion and Plagiarism in Computer Science Coursework [J].
Fraser, Robert .
INFORMATICS IN EDUCATION, 2014, 13 (02) :179-195
[24]   WASTK: A Weighted Abstract Syntax Tree Kernel Method for Source Code Plagiarism Detection [J].
Fu, Deqiang ;
Xu, Yanyan ;
Yu, Haoran ;
Yang, Boyang .
SCIENTIFIC PROGRAMMING, 2017, 2017
[25]   Retrieving and classifying instances of source code plagiarism [J].
Ganguly, Debasis ;
Jones, Gareth J. F. ;
Ramirez-de-la-Cruz, Aaron ;
Ramirez-de-la-Rosa, Gabriela ;
Villatoro-Tello, Esau .
INFORMATION RETRIEVAL JOURNAL, 2018, 21 (01) :1-23
[26]  
Inoue U., 2012, 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery, P2308, DOI 10.1109/FSKD.2012.6234186
[27]  
Jadalla Ameera, 2008, International Journal of Business Intelligence and Data Mining, V3, P121, DOI 10.1504/IJBIDM.2008.020514
[28]  
Ji J., 2012, P ADV NEUR INF PROC, P108
[29]   A Plagiarism Detection Technique for Java']Java Program Using Bytecode Analysis [J].
Ji, Jeong-Hoon ;
Woo, Gyun ;
Cho, Hwan-Gue .
THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, :1092-1098
[30]  
Jiang Yanyan, 2018, P ACM TURING CELEBRA, P27, DOI DOI 10.1145/3210713.3210724