A Deep Learning Based Approach to Automate Clinical Coding of Electronic Health Records

被引：0

作者：

Kumar, Ashutosh ^{[1
]}

Rathore, Santosh Singh ^{[1
]}

机构：

[1] ABV Indian Inst Informat Technol & Management, Dept Comp Sci & Engn, Gwalior, India

来源：

BIG DATA ANALYTICS, BDA 2022 | 2022年 / 13773卷

关键词：

Machine learning; Electronic health record; Clinical coding; Cosine similarity; Transformers;

D O I：

10.1007/978-3-031-24094-2_7

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The medical records in different electronic formats, such as handwritten notes, diagnosis summaries, lab reports, electronic pdfs, etc., contain valuable information that can be used for various medical purposes. These health records are currently coded manually or semi-automated to assign clinical codes (ICD-codes) for clinical research and analytics. This process is very time-consuming, expensive, and error-prone. This paper presents a method for automated clinical coding of electronic health records (EHRs) given the patient diagnosis summary and other medical-related documents. The presented method uses natural language processing (NLP) techniques, which capture knowledge from the free-text diagnosis descriptions, do the text matching and semantic mapping, and translate diagnosis descriptions into clinical codes. We develop one baseline Word2vec and cosine similarity hybrid model, a transformer encoder model, and a BERT (Bidirectional Encoder Representations from Transformers) model for the automated clinical coding. The presented models are evaluated using a publicly available Medical Information Mart for Intensive Care III (MIMIC-III) dataset. The used dataset consists of various patient diagnosis descriptions and corresponding ICD-9 codes. The experimental results show that the presented BlueBERT based automated clinical coding model produced an AUC (area under ROC curve) value of 98.9% for the top-10 ICD codes prediction. On the full MIMIC-III dataset, the transformer model produced an accuracy of 76.8%, a precision of 61.02%, a recall of 47.22%, a f1-score of 53.2%, and an AUC value of 92.1%. The hybrid baseline model and another used transformer encoder model also showed promising results.

引用

页码：104 / 116

页数：13

共 50 条

[1] Deep Learning-Based Natural Language Processing to Automate Esophagitis Severity Grading from the Electronic Health Records
Chen, S.
Guevara, M.
Ramirez, N.
Aerts, H.
Miller, T. A.
Savova, G. K.
Mak, R. H.
Bitterman, D. S.
INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2023, 117 (02): : S18 - S18
[2] A Novel Deep Similarity Learning Approach to Electronic Health Records Data
Gupta, Vagisha
Sachdeva, Shelly
Bhalla, Subhash
IEEE ACCESS, 2020, 8 : 209278 - 209295
[3] Deep Learning for Electronic Health Records Analytics
Harerimana, Gaspard
Kim, Jong Wook
Yoo, Hoon
Jang, Beakcheol
IEEE ACCESS, 2019, 7 : 101245 - 101259
[4] A Survey of Deep Learning for Electronic Health Records
Xu, Jiabao
Xi, Xuefeng
Chen, Jie
Sheng, Victor S.
Ma, Jieming
Cui, Zhiming
APPLIED SCIENCES-BASEL, 2022, 12 (22):
[5] An Integrated Approach for Analysis of Electronic Health Records Using Blockchain and Deep Learning
Singhal P.
Gupta S.
Deepak
Singh J.
Recent Advances in Computer Science and Communications, 2023, 16 (09)
[6] A Deep Learning Approach to Predict Neonatal Encephalopathy from Electronic Health Records
Gao, Cheng
Yan, Chao
Osmundson, Sarah
Malin, Bradley A.
Chen, You
2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019, : 170 - 176
[7] Clinical Coding Support Based on Structured Data Stored in Electronic Health Records
Ferrao, Jose C.
Oliveira, Monica D.
Janela, Filipe
Martins, Henrique M. G.
2012 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2012,
[8] A Regularized Deep Learning Approach for Clinical Risk Prediction of Acute Coronary Syndrome Using Electronic Health Records
Huang, Zhengxing
Dong, Wei
Duan, Huilong
Liu, Jiquan
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2018, 65 (05) : 956 - 968
[9] Scalable and accurate deep learning with electronic health records
Alvin Rajkomar
Eyal Oren
Kai Chen
Andrew M. Dai
Nissan Hajaj
Michaela Hardt
Peter J. Liu
Xiaobing Liu
Jake Marcus
Mimi Sun
Patrik Sundberg
Hector Yee
Kun Zhang
Yi Zhang
Gerardo Flores
Gavin E. Duggan
Jamie Irvine
Quoc Le
Kurt Litsch
Alexander Mossin
Justin Tansuwan
De Wang
James Wexler
Jimbo Wilson
Dana Ludwig
Samuel L. Volchenboum
Katherine Chou
Michael Pearson
Srinivasan Madabushi
Nigam H. Shah
Atul J. Butte
Michael D. Howell
Claire Cui
Greg S. Corrado
Jeffrey Dean
npj Digital Medicine, 1
[10] Scalable and accurate deep learning with electronic health records
Rajkomar, Alvin
Oren, Eyal
Chen, Kai
Dai, Andrew M.
Hajaj, Nissan
Hardt, Michaela
Liu, Peter J.
Liu, Xiaobing
Marcus, Jake
Sun, Mimi
Sundberg, Patrik
Yee, Hector
Zhang, Kun
Zhang, Yi
Flores, Gerardo
Duggan, Gavin E.
Irvine, Jamie
Quoc Le
Litsch, Kurt
Mossin, Alexander
Tansuwan, Justin
Wang, De
Wexler, James
Wilson, Jimbo
Ludwig, Dana
Volchenboum, Samuel L.
Chou, Katherine
Pearson, Michael
Madabushi, Srinivasan
Shah, Nigam H.
Butte, Atul J.
Howell, Michael D.
Cui, Claire
Corrado, Greg S.
Dean, Jeffrey
NPJ DIGITAL MEDICINE, 2018, 1

← 1 2 3 4 5 →