Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

被引：36

作者：

Luo, Zhihui ^{[1
]}

Yetisgen-Yildiz, Meliha ^{[2
]}

Weng, Chunhua ^{[1
]}

机构：

[1] Columbia Univ, Dept Biomed Informat, New York, NY 10032 USA

[2] Univ Washington, Seattle, WA 98195 USA

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2011年 / 44卷 / 06期

关键词：

Clinical research eligibility criteria; Classification; Hierarchical clustering; Knowledge representation; Unified Medical Language System (UMLS); Machine learning; Feature representation; CLASSIFICATION;

D O I：

10.1016/j.jbi.2011.06.001

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Objective: To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity. Design: The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hierarchical clustering and to train supervised classifiers. Measurements: We induced 27 categories and measured the prevalence of the categories in 27,278 eligibility criteria from 1578 clinical trials and compared the classification performance (i.e., precision, recall, and F1-score) between the UMLS-based feature representation and the "bag of words" feature representation among five common classifiers in Weka, including J48, Bayesian Network, Naive Bayesian, Nearest Neighbor, and instance-based learning classifier. Results: The UMLS semantic feature representation outperforms the "bag of words" feature representation in 89% of the criteria categories. Using the semantically induced categories, machine-learning classifiers required only 2000 instances to stabilize classification performance. The J48 classifier yielded the best F1-score and the Bayesian Network classifier achieved the best learning efficiency. Conclusion: The UMLS is an effective knowledge source and can enable an efficient feature representation for semi-automated semantic category induction and automatic categorization for clinical research eligibility criteria and possibly other clinical text. (C) 2011 Elsevier Inc. All rights reserved.

引用

页码：927 / 935

页数：9

共 36 条

[1] Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion [J].

Agarwal, Shashank ;

Yu, Hong .

BIOINFORMATICS, 2009, 25 (23) :3174-3180

[2]

AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759

[3]

[Anonymous], P AMIA SUMM CLIN RES

[4]

[Anonymous], SAN FRANC CAL AMIA S

[5]

[Anonymous], AM MED INF ASS ANN S

[6]

[Anonymous], [No title captured]

[7]

[Anonymous], P AMIA FALL S

[8]

[Anonymous], 2004, APPL CLIN TRIALS

[9]

Broughton V., 2001, New Review of Hypermedia and Multimedia, V7, P67, DOI 10.1080/13614560108914727

[10]

Carletta J, 1996, COMPUT LINGUIST, V22, P249

← 1 2 3 4 →