Towards Accurate Detection of Offensive Language in Online Communication in Arabic

被引:59
作者
Alakrot, Azalden [1 ]
Murray, Liam [2 ]
Nikolov, Nikola S. [1 ]
机构
[1] Univ Limerick, Dept Comp Sci & Informat Syst, Limerick, Ireland
[2] Univ Limerick, Sch Languages, Limerick, Ireland
来源
ARABIC COMPUTATIONAL LINGUISTICS | 2018年 / 142卷
关键词
Anti-social behaviour online; offensive language detection; harassment detection; Arabic dataset; text mining; SVM for offensive language detection in Arabic;
D O I
10.1016/j.procs.2018.10.491
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of both offensive and inoffensive comments. We used this dataset to train a Support Vector Machine classifier and experimented with combinations of word-level features, N-gram features and a variety of pre-processing techniques. We summarise the pre-processing steps and features that allow training a classifier which is more precise, with 90.05% accuracy, than classifiers reported by previous studies on Arabic text. (C) 2018 The Authors. Published by Elsevier B.V.
引用
收藏
页码:315 / 320
页数:6
相关论文
共 31 条
[1]   A novel robust Arabic light stemmer [J].
Abainia, Kheireddine ;
Ouamour, Siham ;
Sayoud, Halim .
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2017, 29 (03) :557-573
[2]  
Abozinadah Ehab, 2015, Int J Knowl Eng -IACSIT, V1, P113, DOI DOI 10.7763/IJKE.2015.V1.19
[3]  
AL-Shatnawi A, 2008, INT J COMPUT SCI NET, V8, P137
[4]   Flaming in electronic communication [J].
Alonzo, M ;
Aiken, M .
DECISION SUPPORT SYSTEMS, 2004, 36 (03) :205-213
[5]  
[Anonymous], PROCEDIA COMPUTER SC
[6]  
[Anonymous], P 2 EUR WORKSH DAT M
[7]  
[Anonymous], PATTERN RECOGN LETT
[8]  
[Anonymous], 2016, Int J Comput Appl, DOI [DOI 10.5120/IJCA2016908328, 10.5120/ijca2016908328]
[9]  
[Anonymous], 2000, An Introduction to support vector machines and other kernel-based learning methods
[10]  
[Anonymous], 2009, NATURAL LANGUAGE PRO, DOI DOI 10.1007/S10579-010-9124-X