Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication

被引:13
作者
Bourgonje, Peter [1 ]
Moreno-Schneider, Julian [1 ]
Srivastava, Ankit [1 ]
Rehm, Georg [1 ]
机构
[1] DFKI GmbH, Language Technol Lab, Alt Moabit 91c, D-10559 Berlin, Germany
来源
LANGUAGE TECHNOLOGIES FOR THE CHALLENGES OF THE DIGITAL AGE, GSCL 2017 | 2018年 / 10713卷
关键词
D O I
10.1007/978-3-319-73706-5_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The sheer ease with which abusive and hateful utterances can be made online - typically from the comfort of your home and the lack of any immediate negative repercussions - using today's digital communication technologies (especially social media), is responsible for their significant increase and global ubiquity. Natural Language Processing technologies can help in addressing the negative effects of this development. In this contribution we evaluate a set of classification algorithms on two types of user-generated online content (tweets and Wikipedia Talk comments) in two languages (English and German). The different sets of data we work on were classified towards aspects such as racism, sexism, hatespeech, aggression and personal attacks. While acknowledging issues with inter-annotator agreement for classification tasks using these labels, the focus of this paper is on classifying the data according to the annotated characteristics using several text classification algorithms. For some classification tasks we are able to reach f-scores of up to 81.58.
引用
收藏
页码:180 / 191
页数:12
相关论文
共 28 条
[1]   A roadmap of clustering algorithms: finding a match for a biomedical application [J].
Andreopoulos, Bill ;
An, Aijun ;
Wang, Xiaogang ;
Schroeder, Michael .
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (03) :297-314
[2]  
[Anonymous], 2015, THIS IS WHY WE CANT
[3]  
[Anonymous], 2002, MALLET: A machine learning for language toolkit
[4]  
[Anonymous], THESIS
[5]  
[Anonymous], 1997, Sprachwandel durch computer
[6]  
[Anonymous], 2006, International Journal of Hybrid Intelligent Systems, DOI [10.3233/HIS-2006-3104, DOI 10.3233/HIS-2006-3104]
[7]  
Banks James, 2010, International Review of Law, Computers Technology, V24, P233, DOI [DOI 10.1080/13600869.2010.522323, 10.1080/13600869.2010.522323]
[8]  
Caruana R., 2006, P 23 INT C MACH LEAR, P161, DOI [10.1145/1143844.1143865, DOI 10.1145/1143844.1143865]
[9]  
Crystal D., 2001, [No title captured]
[10]  
Doring N, 2002, J COMPUT MEDIAT COMM, V7