EliXR: an approach to eligibility criteria extraction and representation

被引:93
作者
Weng, Chunhua [1 ]
Wu, Xiaoying [2 ]
Luo, Zhihui [1 ]
Boland, Mary Regina [1 ]
Theodoratos, Dimitri [2 ]
Johnson, Stephen B. [1 ]
机构
[1] Columbia Univ, Dept Biomed Informat, New York, NY 10032 USA
[2] New Jersey Inst Technol, Dept Comp Sci, Newark, NJ 07102 USA
关键词
TEXT;
D O I
10.1136/amiajnl-2011-000321
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective To develop a semantic representation for clinical research eligibility criteria to automate semistructured information extraction from eligibility criteria text. Materials and Methods An analysis pipeline called eligibility criteria extraction and representation (EliXR) was developed that integrates syntactic parsing and tree pattern mining to discover common semantic patterns in 1000 eligibility criteria randomly selected from http://ClinicalTrials.gov. The semantic patterns were aggregated and enriched with unified medical language systems semantic knowledge to form a semantic representation for clinical research eligibility criteria. Results The authors arrived at 175 semantic patterns, which form 12 semantic role labels connected by their frequent semantic relations in a semantic network. Evaluation Three raters independently annotated all the sentence segments (N=396) for 79 test eligibility criteria using the 12 top-level semantic role labels. Eight-six per cent (339) of the sentence segments were unanimously labelled correctly and 13.8% (55) were correctly labelled by two raters. The Fleiss' kappa was 0.88, indicating a nearly perfect interrater agreement. Conclusion This study present a semi-automated data-driven approach to developing a semantic network that aligns well with the top-level information structure in clinical research eligibility criteria text and demonstrates the feasibility of using the resulting semantic role labels to generate semistructured eligibility criteria with nearly perfect interrater reliability.
引用
收藏
页码:I116 / I124
页数:9
相关论文
共 30 条
[1]  
[Anonymous], DEF SEM ROL C COMP N
[2]  
Backus J., 1959, IFIP Congress, P125
[3]   Methods for semi-automated indexing for high precision information retrieval [J].
Berrios, DC ;
Cucina, RJ ;
Fagan, LM .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2002, 9 (06) :637-652
[4]  
Campbell D, 2002, P WORKSH NAT LANG PR, P37
[5]   Mining closed and maximal frequent subtrees from databases of labeled rooted trees [J].
Chi, Y ;
Xia, Y ;
Yang, YR ;
Muntz, RR .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) :190-202
[6]  
CIMINO JJ, 1993, B MED LIBR ASSOC, V81, P195
[7]   Using text to build semantic networks for pharmacogenomics [J].
Coulet, Adrien ;
Shah, Nigam H. ;
Garten, Yael ;
Musen, Mark ;
Altman, Russ B. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (06) :1009-1019
[8]  
Cucina RJ, 2001, STUD HEALTH TECHNOL, V84, P181
[9]  
FLEISS JL, 1971, PSYCHOL BULL, V76, P378, DOI 10.1037/h0031619
[10]  
FLORANCE V, 1992, B MED LIBR ASSOC, V80, P140