WikiDetect: Automatic Vandalism Detection for Wikipedia Using Linguistic Features

被引:0
作者
Cioiu, Dan [1 ]
Rebedea, Traian [1 ]
机构
[1] Univ Politehn Bucuresti, Dept Comp Sci & Engn, Bucharest, Romania
来源
COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS | 2013年 / 8083卷
关键词
Vandalism Detection; Wikipedia; Natural Language Processing; Classification;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vandalism of the content has always been one of the greatest problems for Wikipedia, yet only few completely automatic solutions for solving it have been developed so far. Volunteers still spend large amounts of time correcting vandalized page edits, instead of using this time to improve the quality of the content of articles. The purpose of this paper is to introduce a new vandalism detection system, that only uses natural language processing and machine learning techniques. The system has been evaluated on a corpus of real vandalized data in order to test its performance and justify the design choices. The same expert annotated wikitext, extracted from the encyclopedia's database, is used to evaluate different vandalism detection algorithms. The paper presents a critical analysis of the obtained results, comparing them to existing solutions, and suggests different statistical classification methods that bring several improvements to the task at hand.
引用
收藏
页码:316 / 325
页数:10
相关论文
共 14 条
[1]  
Adler B.T., 2010, P 2010 C MULT MULT I
[2]  
[Anonymous], 1999, P 16 INT C MACH LEAR
[3]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[4]  
Chichkov D., 2010, 1 INT COMPETIT UNPUB
[5]  
Dragusanu C.-A., 2011, CLEF 2011 LABS WORKS
[6]  
Harpalani M., 2010, P 2010 C MULT MULT I
[7]  
Hegedus I., 2010, NOVEL BALANCED FEATU
[8]  
Javanmardi S, 2011, P 7 INT S WIK OP COL
[9]  
Jiang J, 1997, INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, 1997 DIGEST OF TECHNICAL PAPERS, P94
[10]  
Mola Velasco S.M., 2010, CLEF 2010 LABS WORKS