The CLASSE GATOR (CLinical Acronym SenSE disambiGuATOR): A Method for predicting acronym sense from neonatal clinical notes

被引：8

作者：

Kashyap, Aditya ^{[1
]}

Burris, Heather ^{[2
,3
]}

Callison-Burch, Chris ^{[1
]}

Boland, Mary Regina ^{[4
,5
,6
,7
]}

机构：

[1] Univ Penn, Dept Comp Sci, Philadelphia, PA 19104 USA

[2] Childrens Hosp Philadelphia, Dept Pediat, Div Neonatol, Philadelphia, PA 19104 USA

[3] Univ Penn, Perelman Sch Med, Philadelphia, PA 19104 USA

[4] Univ Penn, Perelman Sch Med, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USA

[5] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA

[6] Univ Penn, Ctr Excellence Environm Toxicol, Philadelphia, PA 19104 USA

[7] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA

来源：

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS | 2020年 / 137卷

关键词：

Electronic health records; Natural language processing; Secondary reuse; Transfer learning; EXTRACTION;

D O I：

10.1016/j.ijmedinf.2020.104101

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objective: To develop an algorithm for identifying acronym 'sense' from clinical notes without requiring a clinically annotated training set. Materials and Methods: Our algorithm is called CLASSE GATOR: Clinical Acronym SenSE disambiGuATOR. CLASSE GATOR extracts acronyms and definitions from PubMed Central (PMC). A logistic regression model is trained using words associated with specific acronym-definition pairs from PMC. CLASSE GATOR uses this library of acronym-definitions and their corresponding word feature vectors to predict the acronym 'sense' from Beth Israel Deaconess (MIMIC-III) neonatal notes. Results: We identified 1,257 acronyms and 8,287 definitions including a random definition from 31,764 PMC articles on prenatal exposures and 2,227,674 PMC open access articles. The average number of senses (definitions) per acronym was 6.6 (min = 2, max = 50). The average internal 5-fold cross validation was 87.9 % (on PMC). We found 727 unique acronyms (57.29 %) from PMC were present in 105,044 neonatal notes (MIMIC-III). We evaluated the performance of acronym prediction using 245 manually annotated clinical notes with 9 distinct acronyms. CLASSE GATOR achieved an overall accuracy of 63.04 % and outperformed random for 8/9 acronyms (88.89 %) when applied to clinical notes. We also compared our algorithm with UMN's acronym set, and found that CLASSE GATOR outperformed random for 63.46 % of 52 acronyms when using logistic regression, 75.00 % when using Bert and 76.92 % when using BioBert as the prediction algorithm within CLASSE GATOR. Conclusions: CLASSE GATOR is the first automated acronym sense disambiguation method for clinical notes. Importantly, CLASSE GATOR does not require an expensive manually annotated acronym-definition corpus for training.

引用

页数：10

共 36 条

[1] Towards comprehensive syntactic and semantic annotations of the clinical narrative [J].

Albright, Daniel ;

Lanfranchi, Arrick ;

Fredriksen, Anwen ;

Styler, William F. ;

Warner, Colin ;

Hwang, Jena D. ;

Choi, Jinho D. ;

Dligach, Dmitriy ;

Nielsen, Rodney D. ;

Martin, James ;

Ward, Wayne ;

Palmer, Martha ;

Savova, Guergana K. .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (05) :922-930

[2]

[Anonymous], 2019, ARXIV190108746

[3]

[Anonymous], 2019, ARXIV190208691

[4]

Bard GregoryV., 2007, Proceedings of the fifth Australasian symposium on ACSW frontiers, V68, P117

[5]

Beaulieu-Jones BK, 2018, BIOCOMPUT-PAC SYM, P123

[6] Feasibility of Feature-based Indexing, Clustering, and Search of Clinical Trials [J].

Boland, M. R. ;

Miotto, R. ;

Gao, J. ;

Weng, C. .

METHODS OF INFORMATION IN MEDICINE, 2013, 52 (05) :382-394

[7] Disease associations depend on visit type: results from a visit-wide association study [J].

Boland, Mary Regina ;

Alur-Gupta, Snigdha ;

Levine, Lisa ;

Gabriel, Peter ;

Gonzalez-Hernandez, Graciela .

BIODATA MINING, 2019, 12 (1)

[8] Development and validation of the PEPPER framework (Prenatal Exposure PubMed ParsER) with applications to food additives [J].

Boland, Mary Regina ;

Kashyap, Aditya ;

Xiong, Jiadi ;

Holmes, John ;

Lorch, Scott .

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2018, 25 (11) :1432-1443

[9] A Method for Probing Disease Relatedness Using Common Clinical Eligibility Criteria [J].

Boland, Mary Regina ;

Miotto, Riccardo ;

Weng, Chunhua .

MEDINFO 2013: PROCEEDINGS OF THE 14TH WORLD CONGRESS ON MEDICAL AND HEALTH INFORMATICS, PTS 1 AND 2, 2013, 192 :481-485

[10] A TECHNIQUE FOR COMPUTER DETECTION AND CORRECTION OF SPELLING ERRORS [J].

DAMERAU, FJ .

COMMUNICATIONS OF THE ACM, 1964, 7 (03) :171-176

← 1 2 3 4 →