Machine Learning Based Detection of Vandalism in Wikipedia across Languages

被引:0
作者
Susuri, Arsim [1 ]
Hamiti, Mentor [1 ]
Dika, Agni [1 ]
机构
[1] South East European Univ, Fac Contemporary Sci & Technol, Tetovo, Macedonia
来源
2016 5TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO) | 2016年
关键词
Wikipedia; vandalism; machine learning;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Applying machine learning algorithms for detecting vandalism in two languages are described in this paper. Vandalism is a major issue in Wikipedia as it accounts for about 1% of edits during 2015. The majority of vandalism is from human editors, whose vandalism can be traced through access and edit logs. In this paper, we propose using a list of classifiers in one language, and then evaluate them across languages in two datasets: the hourly count of views of each Wikipedia article, and the used edit history of articles. For this purpose, Simple English and Albanian Wikipedia datasets will be used. The results obtained show that the characteristic features of vandalism can be learned from view and edit patterns, and models built in one language can be applied successfully to other languages.
引用
收藏
页码:446 / 451
页数:6
相关论文
共 22 条
[1]  
Adler B.Thomas., 2010, Notebook papers of CLEF, V1, P22
[2]  
[Anonymous], 2010, CLEF
[3]  
[Anonymous], 2010, EuroSys '10, DOI [DOI 10.1145/1752046.1752050, 10.1145/1752046.1752050]
[4]  
[Anonymous], 2010, P 19 ACM INT C INFOR
[5]  
Chin Si-Chi., 2010, Pro- ceedings of the 4th workshop on Information credibility, P3, DOI [DOI 10.1145/1772938.1772942, 10.1145/1772938.1772942]
[6]  
Hamiti M., 2015, P 19 INT C CIRC SYST, P242
[7]  
Harpalani M., 2011, P ANN M ASS COMP LIN, V2, P83
[8]  
Kittur A, 2007, CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, VOLS 1 AND 2, P453
[9]  
McKeown Kathleen, 2010, P 23 INT C COMPUTATI, P1146
[10]  
Potthast M., 2011, NOTEBOOK PAN CLEF