Cyberbullying detection in social media text based on character-level convolutional neural network with shortcuts

被引:39
作者
Lu, Nijia [1 ]
Wu, Guohua [1 ]
Zhang, Zhen [1 ,4 ]
Zheng, Yitao [1 ]
Ren, Yizhi [1 ]
Choo, Kim-Kwang Raymond [2 ,3 ]
机构
[1] Hangzhou Dianzi Univ, Sch Cyberspace, Hangzhou, Zhejiang, Peoples R China
[2] Univ Texas San Antonio, Dept Informat Syst & Cyber Secur, San Antonio, TX USA
[3] Univ Texas San Antonio, Dept Elect & Comp Engn, San Antonio, TX USA
[4] 1158,2 St,Baiyang St, Hangzhou 310018, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
convolutional neural networks; cyberbullying detection; social network; text classification;
D O I
10.1002/cpe.5627
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As people spend increasingly more time on social networks, cyberbullying has become a social problem that needs to be solved by machine learning methods. Our research focuses on textual cyberbullying detection because text is the most common form of social media. However, the content information in social media is short, noisy, and unstructured with incorrect spellings and symbols, and this impacts the performance of some traditional machine learning methods based on vocabulary knowledge. For this reason, we propose a Char-CNNS (Character-level Convolutional Neural Network with Shortcuts) model to identify whether the text in social media contains cyberbullying. We use characters as the smallest unit of learning, enabling the model to overcome spelling errors and intentional obfuscation in real-world corpora. Shortcuts are utilized to stitch different levels of features to learn more granular bullying signals, and a focal loss function is adopted to overcome the class imbalance problem. We also provide a new Chinese Weibo comment dataset specifically for cyberbullying detection, and experiments are performed on both the Chinese Weibo dataset and the English Tweet dataset. The experimental results show that our approach is competitive with state-of-the-art techniques on cyberbullying detection task.
引用
收藏
页数:11
相关论文
共 39 条
[1]  
Agrawal S, 2018, 40 EUR C IR RES GREN
[2]   Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network [J].
Al-garadr, Mohammed Ali ;
Varathan, Kasturi Dewi ;
Ravana, Sri Devi .
COMPUTERS IN HUMAN BEHAVIOR, 2016, 63 :433-443
[3]  
[Anonymous], 6 INT JOINT C NAT LA
[4]  
[Anonymous], 2014, 52 ANN M ASS COMP LI
[5]  
[Anonymous], C N AM CHAPT ASS COM
[6]  
[Anonymous], 2013, Communications in Information Science and Management Engineering
[7]  
[Anonymous], 2008, 25 INT C MACH LEARN
[8]  
[Anonymous], 2011, P INT AAAI C WEB SOC
[9]  
[Anonymous], 2010, JMLR WORKSH C P
[10]  
[Anonymous], 2006, YOUTH VIOLENCE JUV J