The CLASSE GATOR (CLinical Acronym SenSE disambiGuATOR): A Method for predicting acronym sense from neonatal clinical notes

被引:8
作者
Kashyap, Aditya [1 ]
Burris, Heather [2 ,3 ]
Callison-Burch, Chris [1 ]
Boland, Mary Regina [4 ,5 ,6 ,7 ]
机构
[1] Univ Penn, Dept Comp Sci, Philadelphia, PA 19104 USA
[2] Childrens Hosp Philadelphia, Dept Pediat, Div Neonatol, Philadelphia, PA 19104 USA
[3] Univ Penn, Perelman Sch Med, Philadelphia, PA 19104 USA
[4] Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USA
[5] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
[6] Univ Penn, Ctr Excellence Environm Toxicol, Philadelphia, PA 19104 USA
[7] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA
关键词
Electronic health records; Natural language processing; Secondary reuse; Transfer learning; EXTRACTION;
D O I
10.1016/j.ijmedinf.2020.104101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: To develop an algorithm for identifying acronym 'sense' from clinical notes without requiring a clinically annotated training set. Materials and Methods: Our algorithm is called CLASSE GATOR: Clinical Acronym SenSE disambiGuATOR. CLASSE GATOR extracts acronyms and definitions from PubMed Central (PMC). A logistic regression model is trained using words associated with specific acronym-definition pairs from PMC. CLASSE GATOR uses this library of acronym-definitions and their corresponding word feature vectors to predict the acronym 'sense' from Beth Israel Deaconess (MIMIC-III) neonatal notes. Results: We identified 1,257 acronyms and 8,287 definitions including a random definition from 31,764 PMC articles on prenatal exposures and 2,227,674 PMC open access articles. The average number of senses (definitions) per acronym was 6.6 (min = 2, max = 50). The average internal 5-fold cross validation was 87.9 % (on PMC). We found 727 unique acronyms (57.29 %) from PMC were present in 105,044 neonatal notes (MIMIC-III). We evaluated the performance of acronym prediction using 245 manually annotated clinical notes with 9 distinct acronyms. CLASSE GATOR achieved an overall accuracy of 63.04 % and outperformed random for 8/9 acronyms (88.89 %) when applied to clinical notes. We also compared our algorithm with UMN's acronym set, and found that CLASSE GATOR outperformed random for 63.46 % of 52 acronyms when using logistic regression, 75.00 % when using Bert and 76.92 % when using BioBert as the prediction algorithm within CLASSE GATOR. Conclusions: CLASSE GATOR is the first automated acronym sense disambiguation method for clinical notes. Importantly, CLASSE GATOR does not require an expensive manually annotated acronym-definition corpus for training.
引用
收藏
页数:10
相关论文
共 36 条
[1]   Towards comprehensive syntactic and semantic annotations of the clinical narrative [J].
Albright, Daniel ;
Lanfranchi, Arrick ;
Fredriksen, Anwen ;
Styler, William F. ;
Warner, Colin ;
Hwang, Jena D. ;
Choi, Jinho D. ;
Dligach, Dmitriy ;
Nielsen, Rodney D. ;
Martin, James ;
Ward, Wayne ;
Palmer, Martha ;
Savova, Guergana K. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (05) :922-930
[2]  
[Anonymous], 2019, ARXIV190108746
[3]  
[Anonymous], 2019, ARXIV190208691
[4]  
Bard GregoryV., 2007, Proceedings of the fifth Australasian symposium on ACSW frontiers, V68, P117
[5]  
Beaulieu-Jones BK, 2018, BIOCOMPUT-PAC SYM, P123
[6]   Feasibility of Feature-based Indexing, Clustering, and Search of Clinical Trials [J].
Boland, M. R. ;
Miotto, R. ;
Gao, J. ;
Weng, C. .
METHODS OF INFORMATION IN MEDICINE, 2013, 52 (05) :382-394
[7]   Disease associations depend on visit type: results from a visit-wide association study [J].
Boland, Mary Regina ;
Alur-Gupta, Snigdha ;
Levine, Lisa ;
Gabriel, Peter ;
Gonzalez-Hernandez, Graciela .
BIODATA MINING, 2019, 12 (1)
[8]   Development and validation of the PEPPER framework (Prenatal Exposure PubMed ParsER) with applications to food additives [J].
Boland, Mary Regina ;
Kashyap, Aditya ;
Xiong, Jiadi ;
Holmes, John ;
Lorch, Scott .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (11) :1432-1443
[9]   A Method for Probing Disease Relatedness Using Common Clinical Eligibility Criteria [J].
Boland, Mary Regina ;
Miotto, Riccardo ;
Weng, Chunhua .
MEDINFO 2013: PROCEEDINGS OF THE 14TH WORLD CONGRESS ON MEDICAL AND HEALTH INFORMATICS, PTS 1 AND 2, 2013, 192 :481-485
[10]   A TECHNIQUE FOR COMPUTER DETECTION AND CORRECTION OF SPELLING ERRORS [J].
DAMERAU, FJ .
COMMUNICATIONS OF THE ACM, 1964, 7 (03) :171-176