Towards role-based filtering of disease outbreak reports

被引:8
作者
Doan, Son [1 ]
Kawazoe, Ai [1 ]
Conway, Mike [1 ]
Collier, Nigel [1 ]
机构
[1] Res Org Informat & Syst, Natl Inst Informat, Chiyoda Ku, Tokyo 1018430, Japan
基金
日本学术振兴会;
关键词
Text classification; Semantic roles; Named entities; Information extraction;
D O I
10.1016/j.jbi.2008.12.009
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper explores the role of named entities (NEs) in the classification of disease outbreak report. In the annotation schema of BioCaster, a text mining system for public health protection, important concepts that reflect information about infectious diseases were conceptually analyzed with a formal ontological methodology and classified into types and roles. Types are specified as NE classes and roles are integrated into NEs as attributes such as a chemical and whether it is being used as a therapy for some infectious disease. We focus on the roles of NEs and explore different ways to extract, combine and use them as features in a text classifier. In addition, we investigate the combination of roles with semantic categories of disease-related nouns and verbs. Experimental results using naive Bayes and Support Vector Machine (SVM) algorithms show that: (1) roles in combination with NEs improve performance in text classification, (2) roles in combination with semantic categories of noun and verb features contribute substantially to the improvement of text classification. Both these results were statistically significant compared to the baseline "raw text" representation. We discuss in detail the effects of roles on each NE and on semantic categories of noun and verb features in terms of accuracy, precision/recall and F-score measures for the text classification task. (C) 2008 Elsevier Inc. All rights reserved.
引用
收藏
页码:773 / 780
页数:8
相关论文
共 35 条
[1]  
[Anonymous], 1996, Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering
[2]  
[Anonymous], P 14 INT C MACH LEAR
[3]  
[Anonymous], 2004, ICD10: International Statistical Classification of Disease and Related Health Tenth Revision
[4]  
[Anonymous], 1997, MACHINE LEARNING, MCGRAW-HILL SCIENCE/ENGINEERING/MATH
[5]  
Bloehdorn S, 2004, P MSW 2004 WORKSH 10, P70
[6]  
Bouckaert RR, 2004, LECT NOTES ARTIF INT, V3056, P3
[7]  
Burgun A, 2002, AMIA 2002 SYMPOSIUM, PROCEEDINGS, P86
[8]   Inductive creation of an annotation schema for manually indexing clinical conditions from emergency department reports [J].
Chapman, WW ;
Dowling, JN .
JOURNAL OF BIOMEDICAL INFORMATICS, 2006, 39 (02) :196-208
[9]   A survey of current work in biomedical text mining [J].
Cohen, AM ;
Hersh, WR .
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) :57-71
[10]   BioCaster: detecting public health rumors with a Web-based text mining system [J].
Collier, Nigel ;
Doan, Son ;
Kawazoe, Ai ;
Goodwin, Reiko Matsuda ;
Conway, Mike ;
Tateno, Yoshio ;
Quoc-Hung Ngo ;
Dinh Dien ;
Kawtrakul, Asanee ;
Takeuchi, Koichi ;
Shigematsu, Mika ;
Taniguchi, Kiyosu .
BIOINFORMATICS, 2008, 24 (24) :2940-2941