Mining Twitter for Adverse Drug Reaction Mentions: A Corpus and Classification Benchmark

被引:0
作者
Ginn, Rachel [1 ]
Pimpalkhute, Pranoti [1 ]
Nikfarjam, Azadeh [1 ]
Patki, Apurv [1 ]
O'Connor, Karen [1 ]
Sarker, Abeed [1 ]
Smith, Karen [2 ]
Gonzalez, Graciela [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85281 USA
[2] Regis Univ, Denver, CO USA
来源
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年
关键词
adverse drug reactions; twitter; social media; mining; machine learning; biomedicine; pharmacovigilance; classification; natural language processing; AGREEMENT;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
With many adults using social media to discuss health information, researchers have begun diving into this resource to monitor or detect health conditions on a population level. Twitter, specifically, has flourished to several hundred million users and could present a rich information source for the detection of serious medical conditions, like adverse drug reactions (ADRs). However, Twitter also presents unique challenges due to brevity, lack of structure, and informal language. We present a freely available, manually annotated corpus of 10,822 tweets, which can be used to train automated tools to mine Twitter for ADRs. We collected tweets utilizing drug names as keywords, but expanding them by applying an algorithm to generate misspelled versions of the drug names for maximum coverage. We annotated each tweet for the presence of a mention of an ADR, and for those that had one, annotated the mention (including span and UMLS IDs of the ADRs). Our inter-annotator agreement for the binary classification had a Kappa value of 0.69, which may be considered substantial (Viera & Garrett, 2005). We evaluated the utility of the corpus by training two classes of machine learning algorithms: Naive Bayes and Support Vector Machines. The results we present validate the usefulness of the corpus for automated mining tasks. The classification corpus is available from http://diego.asu.edu/downloads.
引用
收藏
页数:8
相关论文
共 18 条
  • [1] Akay A, 2013, 2013 IEEE POINT-OF-CARE HEALTHCARE TECHNOLOGIES (PHT), P264, DOI 10.1109/PHT.2013.6461335
  • [2] An overview of MetaMap: historical perspective and recent advances
    Aronson, Alan R.
    Lang, Francois-Michel
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) : 229 - 236
  • [3] Bian J, 2012, PROCEEDINGS OF THE 2012 INTERNATIONAL WORKSHOP ON SMART HEALTH AND WELLBEING, P25, DOI 10.1145/2389707.2389713
  • [4] Brody S., 2011, PROC C EMPIRICAL MET, P562
  • [5] Carletta J, 1996, COMPUT LINGUIST, V22, P249
  • [6] PRESCRIBER PROFILE AND POSTMARKETING SURVEILLANCE
    INMAN, W
    PEARCE, G
    [J]. LANCET, 1993, 342 (8872) : 658 - 661
  • [7] Keyuan Jiang, 2013, Advanced Data Mining and Applications. 9th International Conference, ADMA 2013. Proceedings: LNCS 8346, P434, DOI 10.1007/978-3-642-53914-5_37
  • [8] Hospital admissions associated with adverse drug reactions: A systematic review of prospective observational studies
    Kongkaew, Chuenjid
    Noyce, Peter R.
    Ashcroft, Darren M.
    [J]. ANNALS OF PHARMACOTHERAPY, 2008, 42 (7-8) : 1017 - 1025
  • [9] A side effect resource to capture phenotypic effects of drugs
    Kuhn, Michael
    Campillos, Monica
    Letunic, Ivica
    Jensen, Lars Juhl
    Bork, Peer
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2010, 6
  • [10] Leaman R, 2010, Proceedings of the 2010 workshop on biomedical natural language processing, P117