An Assessment of Deep Learning Models and Word Embeddings for Toxicity Detection within Online Textual Comments

被引:16
作者
Dessi, Danilo [1 ,2 ]
Recupero, Diego Reforgiato [3 ]
Sack, Harald [1 ,2 ]
机构
[1] FIZ Karlsruhe Leibniz Inst Informat Infrastruct, Hermann von Helmholtz Pl 1, D-76344 Eggenstein Leopoldshafen, Germany
[2] Karlsruhe Inst Technol, Inst AIFB, Kaiserstr 89, D-76133 Karlsruhe, Germany
[3] Univ Cagliari, Dept Math & Comp Sci, I-09124 Cagliari, Italy
关键词
deep learning; word embeddings; toxicity detection; binary classification; SENTIMENT ANALYSIS; CLASSIFICATION; CHALLENGE;
D O I
10.3390/electronics10070779
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Today, increasing numbers of people are interacting online and a lot of textual comments are being produced due to the explosion of online communication. However, a paramount inconvenience within online environments is that comments that are shared within digital platforms can hide hazards, such as fake news, insults, harassment, and, more in general, comments that may hurt someone's feelings. In this scenario, the detection of this kind of toxicity has an important role to moderate online communication. Deep learning technologies have recently delivered impressive performance within Natural Language Processing applications encompassing Sentiment Analysis and emotion detection across numerous datasets. Such models do not need any pre-defined hand-picked features, but they learn sophisticated features from the input datasets by themselves. In such a domain, word embeddings have been widely used as a way of representing words in Sentiment Analysis tasks, proving to be very effective. Therefore, in this paper, we investigated the use of deep learning and word embeddings to detect six different types of toxicity within online comments. In doing so, the most suitable deep learning layers and state-of-the-art word embeddings for identifying toxicity are evaluated. The results suggest that Long-Short Term Memory layers in combination with mimicked word embeddings are a good choice for this task.
引用
收藏
页数:18
相关论文
共 54 条
[1]  
[Anonymous], 2006, P 23 INT C MACH LEAR, DOI 10.1145/1143844.1143874
[2]   Multi-domain sentiment analysis with mimicked and polarized word embeddings for human-robot interaction [J].
Atzeni, Mattia ;
Recupero, Diego Reforgiato .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 110 :984-999
[3]  
Brassard-Gourdeau É, 2019, THIRD WORKSHOP ON ABUSIVE LANGUAGE ONLINE, P1
[4]   A Supervised Multi-class Multi-labelWord Embeddings Approach for Toxic Comment Classification [J].
Carta, Salvatore ;
Corriga, Andrea ;
Mulas, Riccardo ;
Recupero, Diego ;
Saia, Roberto .
KDIR: PROCEEDINGS OF THE 11TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL 1: KDIR, 2019, :105-112
[5]  
Cheng KW, 2017, AAAI CONF ARTIF INTE, P3429
[6]  
Consoli S., 2020, P 1 WORKSH SMART PER, P33
[7]   Sentiment Analysis Based on Deep Learning: A Comparative Study [J].
Dang, Nhan Cach ;
Moreno-Garcia, Maria N. ;
De la Prieta, Fernando .
ELECTRONICS, 2020, 9 (03)
[8]  
Dessì D, 2018, ADV INTELL SYST COMP, V746, P1386, DOI 10.1007/978-3-319-77712-2_133
[9]   Evaluating Neural Word Embeddings Created from Online Course Reviews for Sentiment Analysis [J].
Dessi, Danilo ;
Dragoni, Mauro ;
Fenu, Gianni ;
Marras, Mirko ;
Recupero, Diego Reforgiato .
SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, :2124-2127
[10]  
Dessì D, 2019, INTEL SYST REF LIBR, V149, P7, DOI 10.1007/978-3-319-94030-4_2