Natural language model for automatic identification of Intimate Partner Violence reports from Twitter

被引:18
作者
Al-Garadi, Mohammed Ali [1 ]
Kim, Sangmi [2 ]
Guo, Yuting [1 ]
Warren, Elise [3 ]
Yang, Yuan-Chi [1 ]
Lakamana, Sahithi [1 ]
Sarker, Abeed [1 ,4 ,5 ]
机构
[1] Emory Univ, Sch Med, Dept Biomed Informat, Atlanta, GA 30322 USA
[2] Emory Univ, Sch Nursing, Atlanta, GA USA
[3] Emory Univ, Rollins Sch Publ Hlth, Atlanta, GA USA
[4] Georgia Inst Technol, Dept Biomed Engn, Atlanta, GA USA
[5] Emory Univ, Atlanta, GA USA
关键词
Intimate partner violence; Domestic violence; Natural language processing; Machine learning; Social media; SOCIAL MEDIA; AGREEMENT; KAPPA; RISK;
D O I
10.1016/j.array.2022.100217
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Intimate partner violence (IPV) is a preventable public health problem that affects millions of people worldwide. Approximately one in four women are estimated to be or have been victims of severe violence at some point in their lives, irrespective of age, ethnicity, and economic status. Victims often report IPV experiences on social media, and automatic detection of such reports via machine learning may enable improved surveillance and targeted distribution of support and/or interventions for those in need. However, no artificial intelligence systems for automatic detection currently exists, and we attempted to address this research gap. We collected posts from Twitter using a list of IPV-related keywords, manually reviewed subsets of retrieved posts, and prepared annotation guidelines to categorize tweets into IPV-report or non-IPV-report. We annotated 6,348 tweets in total, with the inter-annotator agreement (IAA) of 0.86 (Cohen's kappa) among 1,834 double-annotated tweets. The class distribution in the annotated dataset was highly imbalanced, with only 668 posts (similar to 11%) labeled as IPV-report. We then developed an effective natural language processing model to identify IPV-reporting tweets automatically. The developed model achieved classification F-1-scores of 0.76 for the IPV-report class and 0.97 for the non-IPV-report class. We conducted post-classification analyses to determine the causes of system errors and to ensure that the system did not exhibit biases in its decision making, particularly with respect to race and gender. Our automatic model can be an essential component for a proactive social media-based intervention and support framework, while also aiding population-level surveillance and large-scale cohort studies.
引用
收藏
页数:8
相关论文
共 51 条
[1]  
Abburi H., 2020, P 28 INT C COMP LING, P5810
[2]   COVID-19 and the rise of intimate partner violence [J].
Aguero, Jorge M. .
WORLD DEVELOPMENT, 2021, 137
[3]   Text classification models for the automatic detection of nonmedical prescription medication use from social media [J].
Al-Garadi, Mohammed Ali ;
Yang, Yuan-Chi ;
Cai, Haitao ;
Ruan, Yucheng ;
O'Connor, Karen ;
Graciela, Gonzalez-Hernandez ;
Perrone, Jeanmarie ;
Sarker, Abeed .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)
[4]   Automatic Identification and Classification of Misogynistic Language on Twitter [J].
Anzovino, Maria ;
Fersini, Elisabetta ;
Rosso, Paolo .
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2018), 2018, 10859 :57-64
[5]   Alarming trends in US domestic violence during the COVID-19 pandemic [J].
Boserup, Brad ;
McKenney, Mark ;
Elkbuli, Adel .
AMERICAN JOURNAL OF EMERGENCY MEDICINE, 2020, 38 (12) :2753-2755
[6]  
Breiding M., 2015, Intimate partner violence surveillance: Uniform definitions and recommended data elements, version 2.0
[7]   Health consequences of intimate partner violence [J].
Campbell, JC .
LANCET, 2002, 359 (9314) :1331-1336
[8]   A Systematic Review of Risk Factors for Intimate Partner Violence [J].
Capaldi, Deborah M. ;
Knoble, Naomi B. ;
Shortt, Joann Wu ;
Kim, Hyoun K. .
PARTNER ABUSE, 2012, 3 (02) :231-280
[9]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)