Multi-Label Few-Shot ICD Coding as Autoregressive Generation with Prompt

被引：0

作者：

Yang, Zhichao ^{[1
]}

Kwon, Sunjae ^{[1
]}

Yao, Zonghai ^{[1
]}

Yu, Hong ^{[1
,2
,3
]}

机构：

[1] Univ Massachusetts, Coll Informat & Comp Sci, Amherst, MA 01003 USA

[2] Univ Massachusetts Lowell, Dept Comp Sci, Lowell, MA USA

[3] Vet Affairs Bedford Healthcare Syst, Ctr Healthcare Org & Implementat Res, Bedford, MA USA

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4 | 2023年

基金：

美国国家卫生研究院; 美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with an average of 3,000+ tokens. This task is challenging due to the high-dimensional space of multi-label assignment (155,000+ ICD code candidates) and the long-tail challenge - Many ICD codes are infrequently assigned yet infrequent ICD codes are important clinically. This study addresses the long-tail challenge by transforming this multi-label classification task into an autoregressive generation task. Specifically, we first introduce a novel pretraining objective to generate free text diagnoses and procedures using the SOAP structure, the medical logic physicians use for note documentation. Second, instead of directly predicting the high dimensional space of ICD codes, our model generates the lower dimension of text descriptions, which then infers ICD codes. Third, we designed a novel prompt template for multi-label classification. We evaluate our Generation with Prompt (GP(soap)) model with the benchmark of all code assignment (MIMIC-III-full) and few shot ICD code assignment evaluation benchmark (MIMIC-III-few). Experiments on MIMIC-III-few show that our model performs with a marco F1 30.2, which substantially outperforms the previous MIMIC-III-full SOTA model (marco F1 4.3) and the model specifically designed for few/zero shot setting (marco F1 18.7). Finally, we design a novel ensemble learner, a cross-attention reranker with prompts, to integrate previous SOTA and our best few-shot coding predictions. Experiments on MIMIC-III-full show that our ensemble learner substantially improves both macro and micro F1, from 10.4 to 14.6 and from 58.2 to 59.1, respectively.

引用

页码：5366 / 5374

页数：9

共 69 条

[1] Interpretable deep learning to map diagnostic texts to ICD-10 codes
Atutxa, Aitziber
Diaz de Ilarraza, Arantza
Gojenola, Koldo
Oronoz, Maite
Perez-de-Vinaspre, Olatz
[J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 129 : 49 - 59
[2] Improving Medical Code Prediction from Clinical Text via Incorporating Online Knowledge Sources
Bai, Tian
Vucetic, Slobodan
[J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 72 - 82
[3] Beltagy I, 2020, Arxiv, DOI arXiv:2004.05150
[4] The Unified Medical Language System (UMLS): integrating biomedical terminology
Bodenreider, O
[J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D267 - D270
[5] Brown T, 2020, Adv Neural Inf Process Syst, V33, P1877
[6] Cai P., 2022, P 29 INT C COMP LING
[7] Cao PF, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P3105
[8] Dai X, 2022, Arxiv, DOI arXiv:2204.06683
[9] Dang V., 2013, ECIR, V423, P434
[10] De Cao N., 2021, ICLR

← 1 2 3 4 5 6 7 →