Detecting Offensive Language in Social Media to Protect Adolescent Online Safety

被引:289
作者
Chen, Ying [1 ]
Zhou, Yilu [2 ]
Zhu, Sencun [1 ,3 ]
Xu, Heng [3 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
[2] George Washington Univ, Dept Informat Syst & Technol Management, Washington, DC 20037 USA
[3] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
来源
PROCEEDINGS OF 2012 ASE/IEEE INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY, RISK AND TRUST AND 2012 ASE/IEEE INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING (SOCIALCOM/PASSAT 2012) | 2012年
基金
美国国家科学基金会;
关键词
cyberbullying; adolescent safety; offensive languages; social media;
D O I
10.1109/SocialCom-PASSAT.2012.55
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Since the textual contents on online social media are highly unstructured, informal, and often misspelled, existing research on message-level offensive language detection cannot accurately detect offensive content. Meanwhile, user-level offensiveness detection seems a more feasible approach but it is an under researched area. To bridge this gap, we propose the Lexical Syntactic Feature (LSF) architecture to detect offensive content and identify potential offensive users in social media. We distinguish the contribution of pejoratives/profanities and obscenities in determining offensive content, and introduce hand-authoring syntactic rules in identifying name-calling harassments. In particular, we incorporate a user's writing style, structure and specific cyberbullying content as features to predict the user's potentiality to send out offensive content. Results from experiments showed that our LSF framework performed significantly better than existing methods in offensive content detection. It achieves precision of 98.24% and recall of 94.34% in sentence offensive detection, as well as precision of 77.9% and recall of 77.8% in user offensive detection. Meanwhile, the processing speed of LSF is approximately 10msec per sentence, suggesting the potential for effective deployment in social media.
引用
收藏
页码:71 / 80
页数:10
相关论文
共 27 条
[1]  
[Anonymous], CONT AN WEB 2 0 WORK
[2]  
[Anonymous], 2005, P 14 ACM INT C INF
[3]  
[Anonymous], LREC
[4]  
Cheng J., 2007, ARS TECHNICA, V2011
[5]  
Gwenn S. O. K., 2011, PEDIATRICS
[6]   Genetic programming for prevention of cyberterrorism through dynamic and evolving intrusion detection [J].
Hansen, James V. ;
Lowry, Paul Benjamin ;
Meservy, Rayman D. ;
McDonald, Daniel M. .
DECISION SUPPORT SYSTEMS, 2007, 43 (04) :1362-1374
[7]   The pragmatics of swearing [J].
Jay, Timothy ;
Janschewitz, Kristin .
JOURNAL OF POLITENESS RESEARCH-LANGUAGE BEHAVIOUR CULTURE, 2008, 4 (02) :267-288
[8]  
Jianbin Ma, 2011, Intelligence and Security Informatics. Proceedings Pacific Asia Workshop, PAISI 2011, P27, DOI 10.1007/978-3-642-22039-5_3
[9]  
Johnson T., 2011, NATL CTR ADDICTION S, V2011
[10]  
Kontostathis A., 2009, P TEXT MIN WORKSH 20