Experiments on the Indonesian Plagiarism Detection using Latent Semantic Analysis

被引:0
作者
Soleman, Sidik
Purwarianti, Ayu
机构
来源
2014 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT) | 2014年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Plagiarism is an important task since its number is increasing and the plagiarism technique is getting difficult. It means that there is not only literal plagiarism but also intelligence plagiarism. In order to handle the intelligence plagiarism, we employed latent semantic analysis (LSA) as the term-document representation. The LSA was used in the Heuristic Retrieval (HR) component and Detailed Analysis (DA) component. We conducted several experiments to compare the token type, the text segmentation and the threshold value. The test data were prepared manually from the available Indonesian paper corpus. Experimental results showed that the LSA outperformed the VSM (Vector Space Model), especially in test cases with intelligence plagiarism.
引用
收藏
页数:6
相关论文
共 13 条
[1]  
Alzahrani Salha, 2011, IEEE T SYST MAN CYB, P1
[2]  
Ceska Zdenek, 2009, PLAGIARISM DETECTION
[3]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[4]  
2-9
[5]  
Kurniawati Anna, 2010, KOMMIT
[6]  
Kusmawan Putu Yuwono, 2010, APLIKASI PENDETEKSI
[7]  
Mahathir Fakhri, 2011, SISTEM PENDETEKSI PL
[8]  
Novanta Audi, 2009, PENDETEKSIAN PLAGIAR
[9]  
Nugroho E, 2011, PERANCANGAN SISTEM D
[10]  
Potthast M, 2009, OVERVIEW 1 INT COMPE