CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain

被引：9

作者：

Lange, Lukas ^{[1
,2
]}

Adel, Heike ^{[1
]}

Stroetgen, Jannik ^{[1
]}

Klakow, Dietrich ^{[1
]}

机构：

[1] Bosch Ctr Artificial Intelligence, D-71272 Renningen, Germany

[2] Saarland Univ, Spoken Language Syst Grp, Saarland Informat Campus, D-66111 Saarbrucken, Germany

来源：

BIOINFORMATICS | 2022年 / 38卷 / 12期

关键词：

INFORMATION EXTRACTION; SHARED TASK;

D O I：

10.1093/bioinformatics/btac297

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: The field of natural language processing (NLP) has recently seen a large change toward using pre-trained language models for solving almost any task. Despite showing great improvements in benchmark datasets for various tasks, these models often perform sub-optimal in non-standard domains like the clinical domain where a large gap between pre-training documents and target documents is observed. In this article, we aim at closing this gap with domain-specific training of the language model and we investigate its effect on a diverse set of downstream tasks and settings. Results: We introduce the pre-trained CLIN-X (Clinical XLM-R) language models and show how CLIN-X outperforms other pre-trained transformer models by a large margin for 10 clinical concept extraction tasks from two languages. In addition, we demonstrate how the transformer model can be further improved with our proposed task- and language-agnostic model architecture based on ensembles over random splits and cross-sentence context. Our studies in low-resource and transfer settings reveal stable model performance despite a lack of annotated data with improvements of up to 47 F-1 points when only 250 labeled sentences are available. Our results highlight the importance of specialized language models, such as CLIN-X, for concept extraction in non-standard domains, but also show that our task-agnostic model architecture is robust across the tested tasks and languages so that domain- or task-specific adaptations are not required.

引用

页码：3267 / 3274

页数：8

共 41 条

[1] Akbik A., 2018, COLING 2018, 27th International Conference on Computational Linguistics, P1638
[2] Alsentzer E., 2019, 2 CLIN NATURAL LANG, DOI [DOI 10.18653/V1/W19-1909, 10.18653/v1/W19-1909]
[3] Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615
[4] Clark C, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P4069
[5] Collobert R, 2011, J MACH LEARN RES, V12, P2493
[6] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7] Finkel J., 2004, JOINT WORKSHOP NATUR, P88
[8] Friedrich A., 2020, P 58 ANN M ASS COMPU, p1255 1268
[9] Gonzalez-Agirre A., 2019, P 5 WORKSH BIONLP OP, P1
[10] Gorman K, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2786, DOI 10.18653/v1/p19-1267

← 1 2 3 4 5 →