CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese

被引：4

作者：

Rubel Schneider, Elisa Terumi ^{[1
]}

Gumiel, Yohan Bonescki ^{[2
]}

Andrioli de Souza, Joao Vitor ^{[3
]}

Mukai, Lilian Mie ^{[2
]}

Silva e Oliveira, Lucas Emanuel ^{[3
]}

Rebelo, Marina de Sa ^{[4
]}

Gutierrez, Marco Antonio ^{[4
]}

Krieger, Jose Eduardo ^{[4
]}

Teodoro, Douglas ^{[5
]}

Moro, Claudia ^{[1
]}

Paraiso, Emerson Cabrera ^{[1
]}

机构：

[1] Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil

[2] Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil

[3] Comsentimento, Curitiba, Parana, Brazil

[4] HC FMUSP, InCor, Inst Heart, Sao Paulo, Brazil

[5] Univ Geneva, Geneva, Switzerland

来源：

2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS | 2023年

关键词：

natural language processing; transformer; clinical texts; language model;

D O I：

10.1109/CBMS58004.2023.00247

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Contextual word embeddings and the Transformers architecture have reached state-of-the-art results in many natural language processing (NLP) tasks and improved the adaptation of models for multiple domains. Despite the improvement in the reuse and construction of models, few resources are still developed for the Portuguese language, especially in the health domain. Furthermore, the clinical models available for the language are not representative enough for all medical specialties. This work explores deep contextual embedding models for the Portuguese language to support clinical NLP tasks. We transferred learned information from electronic health records of a Brazilian tertiary hospital specialized in cardiology diseases and pre-trained multiple clinical BERT-based models. We evaluated the performance of these models in named entity recognition experiments, fine-tuning them in two annotated corpora containing clinical narratives. Our pre-trained models outperformed previous multilingual and Portuguese BERT-based models for cardiology and multi-specialty environments, reaching the state-of-the-art for analyzed corpora, with 5.5% F1 score improvement in TempClinBr (all entities) and 1.7% in SemClinBr (Disorder entity) corpora. Hence, we demonstrate that data representativeness and a high volume of training data can improve the results for clinical tasks, aligned with results for other languages.

引用

页码：378 / 381

页数：4

共 50 条

[21] Adapting transformer-based language models for heart disease detection and risk factors extraction
Houssein, Essam H.
Mohamed, Rehab E.
Hu, Gang
Ali, Abdelmgeid A.
JOURNAL OF BIG DATA, 2024, 11 (01)
[22] Adapting transformer-based language models for heart disease detection and risk factors extraction
Essam H. Houssein
Rehab E. Mohamed
Gang Hu
Abdelmgeid A. Ali
Journal of Big Data, 11
[23] LightSeq2: Accelerated Training for Transformer-Based Models on GPUs
Wang, Xiaohui
Wei, Yang
Xiong, Ying
Huang, Guyue
Qian, Xian
Ding, Yufei
Wang, Mingxuan
Li, Lei
SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
[24] A Transformer-based Framework for Multivariate Time Series Representation Learning
Zerveas, George
Jayaraman, Srideepika
Patel, Dhaval
Bhamidipaty, Anuradha
Eickhoff, Carsten
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2114 - 2124
[25] MuLan-Methyl-multiple transformer-based language models for accurate DNA methylation prediction
Zeng, Wenhuan
Gautam, Anupam
Huson, Daniel H.
GIGASCIENCE, 2023, 12
[26] Enhancing performance of transformer-based models in natural language understanding through word importance embedding
Hong, Seung-Kyu
Jang, Jae-Seok
Kwon, Hyuk-Yoon
KNOWLEDGE-BASED SYSTEMS, 2024, 304
[27] Enhancing Transformer-based language models with commonsense representations for knowledge-driven machine comprehension
Li, Ronghan
Jiang, Zejun
Wang, Lifang
Lu, Xinyu
Zhao, Meng
Chen, Daqing
KNOWLEDGE-BASED SYSTEMS, 2021, 220
[28] MuLan-Methyl-multiple transformer-based language models for accurate DNA methylation prediction
Zeng, Wenhuan
Gautam, Anupam
Huson, Daniel H.
GIGASCIENCE, 2023, 12
[29] A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models
Zhang, Hanqing
Song, Haolin
Li, Shaoyu
Zhou, Ming
Song, Dawei
ACM COMPUTING SURVEYS, 2024, 56 (03)
[30] MuLan-Methyl-multiple transformer-based language models for accurate DNA methylation prediction
Zeng, Wenhuan
Gautam, Anupam
Huson, Daniel H.
GIGASCIENCE, 2023, 12

← 1 2 3 4 5 →