Ontology-based categorization of clinical studies by their conditions

被引:6
作者
Liu, Hao [1 ]
Carini, Simona [2 ]
Chen, Zhehuan [1 ]
Hey, Spencer Phillips [3 ]
Sim, Ida [2 ]
Weng, Chunhua [1 ,4 ]
机构
[1] Columbia Univ, Dept Biomed Informat, New York, NY USA
[2] Univ Calif San Francisco, Dept Med, San Francisco, CA USA
[3] Prism Analyt Technol, Boston, MA USA
[4] Columbia Univ, Dept Biomed Informat, 622 W 168 ST,PH 20 room 407, New York, NY 10032 USA
关键词
Ontology; Clinical Study; SNOMED CT; Data Visualization; Categorization; UMLS; TEXT;
D O I
10.1016/j.jbi.2022.104235
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: The free-text Condition data field in the ClinicalTrials.gov is not amenable to computational processes for retrieving, aggregating and visualizing clinical studies by condition categories. This paper contributes a method for automated ontology-based categorization of clinical studies by their conditions.Materials and Methods: Our method first maps text entries in ClinicalTrials.gov's Condition field to standard condition concepts in the OMOP Common Data Model by using SNOMED CT as a reference ontology and using Usagi for concept normalization, followed by hierarchical traversal of the SNOMED ontology for concept expansion, ontology-driven condition categorization, and visualization. We compared the accuracy of this method to that of the MeSH-based method.Results: We reviewed the 4,506 studies on Vivli.org categorized by our method. Condition terms of 4,501 (99.89%) studies were successfully mapped to SNOMED CT concepts, and with a minimum concept mapping score threshold, 4,428 (98.27%) studies were categorized into 31 predefined categories. When validating with manual categorization results on a random sample of 300 studies, our method achieved an estimated categori-zation accuracy of 95.7%, while the MeSH-based method had an accuracy of 85.0%. Conclusion: We showed that categorizing clinical studies using their Condition terms with referencing to SNOMED CT achieved a better accuracy and coverage than using MeSH terms. The proposed ontology-driven condition categorization was useful to create accurate clinical study categorization that enables clinical re-searchers to aggregate evidence from a large number of clinical studies.
引用
收藏
页数:10
相关论文
共 33 条
[11]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[12]   FullMeSH: improving large-scale MeSH indexing with full text [J].
Dai, Suyang ;
You, Ronghui ;
Lu, Zhiyong ;
Huang, Xiaodi ;
Mamitsuka, Hiroshi ;
Zhu, Shanfeng .
BIOINFORMATICS, 2020, 36 (05) :1533-1541
[13]   MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database [J].
Davis, Allan Peter ;
Wiegers, Thomas C. ;
Rosenstein, Michael C. ;
Mattingly, Carolyn J. .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2012,
[14]   Approximate statistical tests for comparing supervised classification learning algorithms [J].
Dietterich, TG .
NEURAL COMPUTATION, 1998, 10 (07) :1895-1923
[15]   Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers [J].
Hripcsak, George ;
Duke, Jon D. ;
Shah, Nigam H. ;
Reich, Christian G. ;
Huser, Vojtech ;
Schuemie, Martijn J. ;
Suchard, Marc A. ;
Park, Rae Woong ;
Wong, Ian Chi Kei ;
Rijnbeek, Peter R. ;
van der Lei, Johan ;
Pratt, Nicole ;
Noren, G. Niklas ;
Li, Yu-Chuan ;
Stang, Paul E. ;
Madigan, David ;
Ryan, Patrick B. .
MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 :574-578
[16]  
Jin Qiao., 2018, Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering, P47
[17]   Challenges in clinical natural language processing for automated disorder normalization [J].
Leaman, Robert ;
Khare, Ritu ;
Lu, Zhiyong .
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 57 :28-37
[18]   DNorm: disease name normalization with pairwise learning to rank [J].
Leaman, Robert ;
Dogan, Rezarta Islamaj ;
Lu, Zhiyong .
BIOINFORMATICS, 2013, 29 (22) :2909-2917
[19]  
Lee Y, DEV DEEP LEARNING BA
[20]   MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence [J].
Liu, Ke ;
Peng, Shengwen ;
Wu, Junqiu ;
Zhai, Chengxiang ;
Mamitsuka, Hiroshi ;
Zhu, Shanfeng .
BIOINFORMATICS, 2015, 31 (12) :339-347