CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese

被引：4

作者：

Rubel Schneider, Elisa Terumi ^{[1
]}

Gumiel, Yohan Bonescki ^{[2
]}

Andrioli de Souza, Joao Vitor ^{[3
]}

Mukai, Lilian Mie ^{[2
]}

Silva e Oliveira, Lucas Emanuel ^{[3
]}

Rebelo, Marina de Sa ^{[4
]}

Gutierrez, Marco Antonio ^{[4
]}

Krieger, Jose Eduardo ^{[4
]}

Teodoro, Douglas ^{[5
]}

Moro, Claudia ^{[1
]}

Paraiso, Emerson Cabrera ^{[1
]}

机构：

[1] Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil

[2] Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil

[3] Comsentimento, Curitiba, Parana, Brazil

[4] HC FMUSP, InCor, Inst Heart, Sao Paulo, Brazil

[5] Univ Geneva, Geneva, Switzerland

来源：

2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS | 2023年

关键词：

natural language processing; transformer; clinical texts; language model;

D O I：

10.1109/CBMS58004.2023.00247

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Contextual word embeddings and the Transformers architecture have reached state-of-the-art results in many natural language processing (NLP) tasks and improved the adaptation of models for multiple domains. Despite the improvement in the reuse and construction of models, few resources are still developed for the Portuguese language, especially in the health domain. Furthermore, the clinical models available for the language are not representative enough for all medical specialties. This work explores deep contextual embedding models for the Portuguese language to support clinical NLP tasks. We transferred learned information from electronic health records of a Brazilian tertiary hospital specialized in cardiology diseases and pre-trained multiple clinical BERT-based models. We evaluated the performance of these models in named entity recognition experiments, fine-tuning them in two annotated corpora containing clinical narratives. Our pre-trained models outperformed previous multilingual and Portuguese BERT-based models for cardiology and multi-specialty environments, reaching the state-of-the-art for analyzed corpora, with 5.5% F1 score improvement in TempClinBr (all entities) and 1.7% in SemClinBr (Disorder entity) corpora. Hence, we demonstrate that data representativeness and a high volume of training data can improve the results for clinical tasks, aligned with results for other languages.

引用

页码：378 / 381

页数：4

共 50 条

[1] Localizing in-domain adaptation of transformer-based biomedical language models
Buonocore, Tommaso Mario
Crema, Claudio
Redolfi, Alberto
Bellazzi, Riccardo
Parimbelli, Enea
JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 144
[2] Tweets Topic Classification and Sentiment Analysis Based on Transformer-Based Language Models
Mandal, Ranju
Chen, Jinyan
Becken, Susanne
Stantic, Bela
VIETNAM JOURNAL OF COMPUTER SCIENCE, 2023, 10 (02) : 117 - 134
[3] Empirical Study of Tweets Topic Classification Using Transformer-Based Language Models
Mandal, Ranju
Chen, Jinyan
Becken, Susanne
Stantic, Bela
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2021, 2021, 12672 : 340 - 350
[4] Dementia Detection using Transformer-Based Deep Learning and Natural Language Processing Models
Saltz, Ploypaphat
Lin, Shih Yin
Cheng, Sunny Chieh
Si, Dong
2021 IEEE 9TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2021), 2021, : 509 - 510
[5] Pre-training and Evaluating Transformer-based Language Models for Icelandic
Daoason, Jon Friorik
Loftsson, Hrafn
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7386 - 7391
[6] Arlo: Serving Transformer-based Language Models with Dynamic Input Lengths
Tan, Xin
Li, Jiamin
Yang, Yitao
Li, Jingzong
Xu, Hong
53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 367 - 376
[7] Transformer-based deep neural network language models for Alzheimer’s disease risk assessment from targeted speech
Alireza Roshanzamir
Hamid Aghajan
Mahdieh Soleymani Baghshah
BMC Medical Informatics and Decision Making, 21
[8] Transformer-based deep neural network language models for Alzheimer's disease risk assessment from targeted speech
Roshanzamir, Alireza
Aghajan, Hamid
Soleymani Baghshah, Mahdieh
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)
[9] Quantifying the Bias of Transformer-Based Language Models for African American English in Masked Language Modeling
Salutari, Flavia
Ramos, Jerome
Rahmani, Hossein A.
Linguaglossa, Leonardo
Lipani, Aldo
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT I, 2023, 13935 : 532 - 543
[10] Transformer-based Language Models and Homomorphic Encryption: An Intersection with BERT-tiny
Rovida, Lorenzo
Leporati, Alberto
PROCEEDINGS OF THE 10TH ACM INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS, IWSPA 2024, 2024, : 3 - 13

← 1 2 3 4 5 →