Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

被引:34
|
作者
Luo, Zhihui [1 ]
Yetisgen-Yildiz, Meliha [2 ]
Weng, Chunhua [1 ]
机构
[1] Columbia Univ, Dept Biomed Informat, New York, NY 10032 USA
[2] Univ Washington, Seattle, WA 98195 USA
关键词
Clinical research eligibility criteria; Classification; Hierarchical clustering; Knowledge representation; Unified Medical Language System (UMLS); Machine learning; Feature representation; CLASSIFICATION;
D O I
10.1016/j.jbi.2011.06.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity. Design: The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hierarchical clustering and to train supervised classifiers. Measurements: We induced 27 categories and measured the prevalence of the categories in 27,278 eligibility criteria from 1578 clinical trials and compared the classification performance (i.e., precision, recall, and F1-score) between the UMLS-based feature representation and the "bag of words" feature representation among five common classifiers in Weka, including J48, Bayesian Network, Naive Bayesian, Nearest Neighbor, and instance-based learning classifier. Results: The UMLS semantic feature representation outperforms the "bag of words" feature representation in 89% of the criteria categories. Using the semantically induced categories, machine-learning classifiers required only 2000 instances to stabilize classification performance. The J48 classifier yielded the best F1-score and the Bayesian Network classifier achieved the best learning efficiency. Conclusion: The UMLS is an effective knowledge source and can enable an efficient feature representation for semi-automated semantic category induction and automatic categorization for clinical research eligibility criteria and possibly other clinical text. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:927 / 935
页数:9
相关论文
共 50 条
  • [21] Research on Statistical Method for Patent Based on Hierarchical Clustering
    Huang Lucheng
    Cai Shuang
    RECENT ADVANCE IN STATISTICS APPLICATION AND RELATED AREAS, PTS 1 AND 2, 2008, : 1142 - 1147
  • [22] Multi-criteria classification, sorting, and clustering: a bibliometric review and research agenda
    Sarah Ben Amor
    Fateh Belaid
    Ramzi Benkraiem
    Boumediene Ramdani
    Khaled Guesmi
    Annals of Operations Research, 2023, 325 : 771 - 793
  • [23] Research on Hierarchical Clustering Algorithm Based on Cluster Outline
    Meng, Hai-Dong
    Ren, Jing-Pei
    Song, Yu-Chen
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON SOFT COMPUTING TECHNIQUES AND ENGINEERING APPLICATION, ICSCTEA 2013, 2014, 250 : 1 - 10
  • [24] Clinical fracture risk evaluated by hierarchical agglomerative clustering
    C. Kruse
    P. Eiken
    P. Vestergaard
    Osteoporosis International, 2017, 28 : 819 - 832
  • [25] Clinical fracture risk evaluated by hierarchical agglomerative clustering
    Kruse, C.
    Eiken, P.
    Vestergaard, P.
    OSTEOPOROSIS INTERNATIONAL, 2017, 28 (03) : 819 - 832
  • [26] Identification of Power System Dynamic Signature Using Hierarchical Clustering
    Guo, Tingyan
    Milanovic, J. V.
    2014 IEEE PES GENERAL MEETING - CONFERENCE & EXPOSITION, 2014,
  • [27] Fast Dynamic Time Warping and Hierarchical Clustering with Multispectral and Synthetic Aperture Radar Temporal Analysis for Unsupervised Winter Food Crop Mapping
    Li, Hsuan-Yi
    Lawarence, James A.
    Mason, Philippa J.
    Ghail, Richard C.
    AGRICULTURE-BASEL, 2025, 15 (01):
  • [28] EliIE: An open-source information extraction system for clinical trial eligibility criteria
    Kang, Tian
    Zhang, Shaodian
    Tang, Youlan
    Hruby, Gregory W.
    Rusanov, Alexander
    Elhadad, Noemie
    Weng, Chunhua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (06) : 1062 - 1071
  • [29] The Clinical Utility of the Diagnostic Criteria for Psychosomatic Research: A Review of Studies
    Porcelli, Piero
    Guidi, Jenny
    PSYCHOTHERAPY AND PSYCHOSOMATICS, 2015, 84 (05) : 265 - 272
  • [30] Research on Trip Hotspot Discovery Algorithm Based on Hierarchical Clustering
    Luo, Hongqin
    Chen, Dejun
    Xiong, Zhuang
    Wang, Kehao
    PROCEEDINGS OF THE 2017 6TH INTERNATIONAL CONFERENCE ON ENERGY AND ENVIRONMENTAL PROTECTION (ICEEP 2017), 2017, 143 : 246 - 251