A Large Human-Labeled Corpus for Online Harassment Research

被引:109
作者
Golbeck, Jennifer [1 ]
Ashktorab, Zahra [1 ]
Banjo, Rashad O. [1 ]
Berlinger, Alexandra [1 ]
Bhagwan, Siddharth [1 ]
Buntain, Cody [1 ]
Cheakalos, Paul [1 ]
Geller, Alicia A. [1 ]
Gergory, Quint [1 ]
Gnanasekaran, Rajesh Kumar [1 ]
Gunasekaran, Raja Rajan [1 ]
Hoffman, Kelly M. [1 ]
Hottle, Jenny [1 ]
Jienjitlert, Vichita [1 ]
Khare, Shivika [1 ]
Lau, Ryan [1 ]
Martindale, Marianna J. [1 ]
Naik, Shalmali [1 ]
Heather, L. [1 ]
Nixon [1 ]
Ramachandran, Piyush [1 ]
Rogers, Kristine M. [1 ]
Rogers, Lisa [1 ]
Sarin, Meghna Sardana [1 ]
Shahane, Gaurav [1 ]
Thanki, Jayanee [1 ]
Vengataraman, Priyanka [1 ]
Wan, Zijian [1 ]
Wu, Derek Michael [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
来源
PROCEEDINGS OF THE 2017 ACM WEB SCIENCE CONFERENCE (WEBSCI '17) | 2017年
基金
美国国家科学基金会;
关键词
online harassment; datasets;
D O I
10.1145/3091478.3091509
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A fundamental part of conducting cross-disciplinary web science research is having useful, high-quality datasets that provide value to studies across disciplines. In this paper, we introduce a large, hand coded corpus of online harassment data. A team of researchers collaboratively developed a codebook using grounded theory and labeled 35,000 tweets. Our resulting dataset has roughly 15% positive harassment examples and 85% negative examples. This data is useful for training machine learning models, identifying textual and linguistic features of online harassment, and for studying the nature of harassing comments and the culture of trolling.
引用
收藏
页码:229 / 233
页数:5
相关论文
共 6 条
[1]  
Bretschneider Uwe., 2014, Detecting Online Harassment in Social Networks
[2]   Trolls just want to have fun [J].
Buckels, Erin E. ;
Trapnell, Paul D. ;
Paulhus, Delroy L. .
PERSONALITY AND INDIVIDUAL DIFFERENCES, 2014, 67 :97-102
[3]  
Duggan Maeve., 2013, PEW INTERNET AM LIFE
[4]   Trolling in asynchronous computer-mediated communication: From user discussions to academic definitions [J].
Hardaker, Claire .
JOURNAL OF POLITENESS RESEARCH-LANGUAGE BEHAVIOUR CULTURE, 2010, 6 (02) :215-242
[5]  
Kontostathis A., 2013, 5th Annual ACM Web Science Conference, P195, DOI 10.1145/2464464.2464499
[6]   Automatic Identification of Personal Insults on Social News Sites [J].
Sood, Sara Owsley ;
Churchill, Elizabeth F. ;
Antin, Judd .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (02) :270-285