CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain

被引:9
作者
Lange, Lukas [1 ,2 ]
Adel, Heike [1 ]
Stroetgen, Jannik [1 ]
Klakow, Dietrich [1 ]
机构
[1] Bosch Ctr Artificial Intelligence, D-71272 Renningen, Germany
[2] Saarland Univ, Spoken Language Syst Grp, Saarland Informat Campus, D-66111 Saarbrucken, Germany
关键词
INFORMATION EXTRACTION; SHARED TASK;
D O I
10.1093/bioinformatics/btac297
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The field of natural language processing (NLP) has recently seen a large change toward using pre-trained language models for solving almost any task. Despite showing great improvements in benchmark datasets for various tasks, these models often perform sub-optimal in non-standard domains like the clinical domain where a large gap between pre-training documents and target documents is observed. In this article, we aim at closing this gap with domain-specific training of the language model and we investigate its effect on a diverse set of downstream tasks and settings. Results: We introduce the pre-trained CLIN-X (Clinical XLM-R) language models and show how CLIN-X outperforms other pre-trained transformer models by a large margin for 10 clinical concept extraction tasks from two languages. In addition, we demonstrate how the transformer model can be further improved with our proposed task- and language-agnostic model architecture based on ensembles over random splits and cross-sentence context. Our studies in low-resource and transfer settings reveal stable model performance despite a lack of annotated data with improvements of up to 47 F-1 points when only 250 labeled sentences are available. Our results highlight the importance of specialized language models, such as CLIN-X, for concept extraction in non-standard domains, but also show that our task-agnostic model architecture is robust across the tested tasks and languages so that domain- or task-specific adaptations are not required.
引用
收藏
页码:3267 / 3274
页数:8
相关论文
共 41 条
  • [1] Akbik A., 2018, COLING 2018, 27th International Conference on Computational Linguistics, P1638
  • [2] Alsentzer E., 2019, 2 CLIN NATURAL LANG, DOI [DOI 10.18653/V1/W19-1909, 10.18653/v1/W19-1909]
  • [3] Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615
  • [4] Clark C, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P4069
  • [5] Collobert R, 2011, J MACH LEARN RES, V12, P2493
  • [6] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [7] Finkel J., 2004, JOINT WORKSHOP NATUR, P88
  • [8] Friedrich A., 2020, P 58 ANN M ASS COMPU, p1255 1268
  • [9] Gonzalez-Agirre A., 2019, P 5 WORKSH BIONLP OP, P1
  • [10] Gorman K, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2786, DOI 10.18653/v1/p19-1267