Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

被引:36
作者
Luo, Zhihui [1 ]
Yetisgen-Yildiz, Meliha [2 ]
Weng, Chunhua [1 ]
机构
[1] Columbia Univ, Dept Biomed Informat, New York, NY 10032 USA
[2] Univ Washington, Seattle, WA 98195 USA
关键词
Clinical research eligibility criteria; Classification; Hierarchical clustering; Knowledge representation; Unified Medical Language System (UMLS); Machine learning; Feature representation; CLASSIFICATION;
D O I
10.1016/j.jbi.2011.06.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity. Design: The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hierarchical clustering and to train supervised classifiers. Measurements: We induced 27 categories and measured the prevalence of the categories in 27,278 eligibility criteria from 1578 clinical trials and compared the classification performance (i.e., precision, recall, and F1-score) between the UMLS-based feature representation and the "bag of words" feature representation among five common classifiers in Weka, including J48, Bayesian Network, Naive Bayesian, Nearest Neighbor, and instance-based learning classifier. Results: The UMLS semantic feature representation outperforms the "bag of words" feature representation in 89% of the criteria categories. Using the semantically induced categories, machine-learning classifiers required only 2000 instances to stabilize classification performance. The J48 classifier yielded the best F1-score and the Bayesian Network classifier achieved the best learning efficiency. Conclusion: The UMLS is an effective knowledge source and can enable an efficient feature representation for semi-automated semantic category induction and automatic categorization for clinical research eligibility criteria and possibly other clinical text. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:927 / 935
页数:9
相关论文
共 36 条
[1]   Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion [J].
Agarwal, Shashank ;
Yu, Hong .
BIOINFORMATICS, 2009, 25 (23) :3174-3180
[2]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[3]  
[Anonymous], P AMIA SUMM CLIN RES
[4]  
[Anonymous], SAN FRANC CAL AMIA S
[5]  
[Anonymous], AM MED INF ASS ANN S
[6]  
[Anonymous], [No title captured]
[7]  
[Anonymous], P AMIA FALL S
[8]  
[Anonymous], 2004, APPL CLIN TRIALS
[9]  
Broughton V., 2001, New Review of Hypermedia and Multimedia, V7, P67, DOI 10.1080/13614560108914727
[10]  
Carletta J, 1996, COMPUT LINGUIST, V22, P249