The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English

被引:15
作者
Ljubesic, Nikola [1 ]
Fiser, Darja [1 ,2 ]
Erjavec, Tomaz [1 ]
机构
[1] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana, Slovenia
[2] Univ Ljubljana, Fac Arts, Dept Translat, Ljubljana, Slovenia
来源
TEXT, SPEECH, AND DIALOGUE (TSD 2019) | 2019年 / 11697卷
关键词
Socially unacceptable discourse; Slovene language; English language; Manually annotated dataset;
D O I
10.1007/978-3-030-27947-9_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present datasets of Facebook comment threads to mainstream media posts in Slovene and English developed inside the Slovene national project FRENK (the acronym FRENK stands for "FRENK - Raziskave Elektronske Nespodobne Komunikacije" (engl. "Research on Electronic Inappropriate Communication")) which cover two topics, migrants and LGBT, and are manually annotated for different types of socially unacceptable discourse (SUD). The main advantages of these datasets compared to the existing ones are identical sampling procedures, producing comparable data across languages and an annotation schema that takes into account six types of SUD and five targets at which SUD is directed. We describe the sampling and annotation procedures, and analyze the annotation distributions and inter-annotator agreements. We consider this dataset to be an important milestone in understanding and combating SUD for both languages.
引用
收藏
页码:103 / 114
页数:12
相关论文
共 9 条
[1]  
[Anonymous], CONTENT ANAL INTRO I
[2]  
Davidson T., 2017, ABS170304009 CORR
[3]  
Fiser D., 2017, P ALW, P46
[4]  
Ljubesic N, 2018, P 2 WORKSHOP ABUSIVE, P124, DOI 10.18653/v1/W18-5116
[5]  
Pavlopoulos John, 2017, P 2017 C EMPIRICAL M, P1125, DOI 10.18653/v1/D17- 1117
[6]  
Ross Bjorn, 2017, ABS170108118 CORR
[7]  
Waseem Z, 2016, P NAACL STUD RES WOR, P88, DOI [DOI 10.18653/V1/N16-2013, 10.18653/v1/N16-2013]
[8]   Ex Machina: Personal Attacks Seen at Scale [J].
Wulczyn, Ellery ;
Thain, Nithum ;
Dixon, Lucas .
PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 2017, :1391-1399
[9]  
Zampieri M., 2019, P NAACL