Who are the haters? A corpus-based demographic analysis of authors of hate speech

被引:1
作者
Hilte, Lisa [1 ]
Markov, Ilia [2 ]
Ljubesic, Nikola [3 ,4 ,5 ]
Fiser, Darja [3 ,5 ,6 ]
Daelemans, Walter [1 ]
机构
[1] Univ Antwerp, Fac Arts, Dept Linguist, CLIPS, Antwerp, Belgium
[2] Vrije Univ Amsterdam, Fac Humanities, Dept Language Literature & Commun, CLTL, Amsterdam, Netherlands
[3] Inst Jozef Stefan IJS, Dept Knowledge Technol, Ljubljana, Slovenia
[4] Univ Ljubljana, Fac Comp & Informat Sci, Lab Cognit Modeling, Ljubljana, Slovenia
[5] Inst Contemporary Hist, Ljubljana, Slovenia
[6] Univ Ljubljana, Fac Arts, Dept Translat, Ljubljana, Slovenia
来源
FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2023年 / 6卷
关键词
hate speech; demographics; age; gender; language area;
D O I
10.3389/frai.2023.986890
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
IntroductionWe examine the profiles of hate speech authors in a multilingual dataset of Facebook reactions to news posts discussing topics related to migrants and the LGBT+ community. The included languages are English, Dutch, Slovenian, and Croatian. MethodsFirst, all utterances were manually annotated as hateful or acceptable speech. Next, we used binary logistic regression to inspect how the production of hateful comments is impacted by authors' profiles (i.e., their age, gender, and language). ResultsOur results corroborate previous findings: in all four languages, men produce more hateful comments than women, and people produce more hate speech as they grow older. But our findings also add important nuance to previously attested tendencies: specific age and gender dynamics vary slightly in different languages or cultures, suggesting that distinct (e.g., socio-political) realities are at play. DiscussionFinally, we discuss why author demographics are important in the study of hate speech: the profiles of prototypical "haters" can be used for hate speech detection, for sensibilization on and for counter-initiatives to the spread of (online) hatred.
引用
收藏
页数:12
相关论文
共 27 条
  • [1] Chung YL, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2819
  • [2] Davidson T, 2019, THIRD WORKSHOP ON ABUSIVE LANGUAGE ONLINE, P25
  • [3] Fiser D., 2017, P 1 WORKSH AB LANG O, P46
  • [4] Huang XL, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P1440
  • [5] Krippendorff K., 2004, CONTENT ANAL INTRO I
  • [6] Kuhar R, 2016, GENDER POLIT, P147, DOI 10.1057/978-1-137-48093-4_7
  • [7] Lambe J. L., 2004, MASS COMMUNICATION S, V7, P279, DOI [DOI 10.1207/S15327825MCS07032, 10.1207/s15327825mcs0703_2, DOI 10.1207/S15327825MCS0703_2, 10.1207/S15327825MCS0703_2, 10.1207/s15327825mcs0703, DOI 10.1207/S15327825MCS0703]
  • [8] MEASUREMENT OF OBSERVER AGREEMENT FOR CATEGORICAL DATA
    LANDIS, JR
    KOCH, GG
    [J]. BIOMETRICS, 1977, 33 (01) : 159 - 174
  • [9] The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English
    Ljubesic, Nikola
    Fiser, Darja
    Erjavec, Tomaz
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 103 - 114
  • [10] Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages
    Mandl, Thomas
    Modha, Sandip
    Majumder, Prasenjit
    Patel, Daksh
    Dave, Mohana
    Mandlia, Chintak
    Patel, Aditya
    [J]. PROCEEDINGS OF THE 11TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2019), 2019, : 14 - 17