Transfer language selection for zero-shot cross-lingual abusive language detection

被引:18
作者
Eronen, Juuso [1 ]
Ptaszynski, Michal [1 ]
Masui, Fumito [1 ]
Arata, Masaki [1 ]
Leliwa, Gniewosz [2 ]
Wroczynski, Michal [2 ]
机构
[1] Kitami Inst Technol, 165,Koencho, Kitami, Hokkaido 0900015, Japan
[2] Samurai Labs, Aleja Zwyciestwa 96-98, PL-81451 Gdynia, Poland
关键词
Abusive language detection; Zero-shot learning; Transfer learning; Linguistics; SIMILARITY;
D O I
10.1016/j.ipm.2022.102981
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the selection of transfer languages for automatic abusive language detection. Instead of preparing a dataset for every language, we demonstrate the effectiveness of cross-lingual transfer learning for zero-shot abusive language detection. This way we can use existing data from higher-resource languages to build better detection systems for low-resource languages. Our datasets are from seven different languages from three language families. We measure the distance between the languages using several language similarity measures, especially by quantifying the World Atlas of Language Structures. We show that there is a correlation between linguistic similarity and classifier performance. This discovery allows us to choose an optimal transfer language for zero shot abusive language detection.
引用
收藏
页数:18
相关论文
共 82 条
  • [1] Achananuparp P, 2008, LECT NOTES COMPUT SC, V5182, P305, DOI 10.1007/978-3-540-85836-2_29
  • [2] Aggarwal Nitish, 2011, P 6 INT WORKSHOP ONT
  • [3] [Anonymous], 2018, 22018 U SCI TECHN DE
  • [4] [Anonymous], 2013, PERFECTIVE IMPERFECT
  • [5] [Anonymous], 2019, P EMNLP IJCNLP 2019
  • [6] [Anonymous], 2018, NAACL
  • [7] [Anonymous], 2002, CSLG0212012 CORR
  • [8] Hate Speech: A Systematized Review
    Antonia Paz, Maria
    Montero-Diaz, Julio
    Moreno-Delgado, Alicia
    [J]. SAGE OPEN, 2020, 10 (04):
  • [9] Arata Masaki., 2019, Study on change of detection accuracy over time in cyberbullying detection
  • [10] Awekar A., 2018, ABS180106482 CORR