CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese

被引:4
作者
Rubel Schneider, Elisa Terumi [1 ]
Gumiel, Yohan Bonescki [2 ]
Andrioli de Souza, Joao Vitor [3 ]
Mukai, Lilian Mie [2 ]
Silva e Oliveira, Lucas Emanuel [3 ]
Rebelo, Marina de Sa [4 ]
Gutierrez, Marco Antonio [4 ]
Krieger, Jose Eduardo [4 ]
Teodoro, Douglas [5 ]
Moro, Claudia [1 ]
Paraiso, Emerson Cabrera [1 ]
机构
[1] Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil
[2] Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil
[3] Comsentimento, Curitiba, Parana, Brazil
[4] HC FMUSP, InCor, Inst Heart, Sao Paulo, Brazil
[5] Univ Geneva, Geneva, Switzerland
来源
2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS | 2023年
关键词
natural language processing; transformer; clinical texts; language model;
D O I
10.1109/CBMS58004.2023.00247
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextual word embeddings and the Transformers architecture have reached state-of-the-art results in many natural language processing (NLP) tasks and improved the adaptation of models for multiple domains. Despite the improvement in the reuse and construction of models, few resources are still developed for the Portuguese language, especially in the health domain. Furthermore, the clinical models available for the language are not representative enough for all medical specialties. This work explores deep contextual embedding models for the Portuguese language to support clinical NLP tasks. We transferred learned information from electronic health records of a Brazilian tertiary hospital specialized in cardiology diseases and pre-trained multiple clinical BERT-based models. We evaluated the performance of these models in named entity recognition experiments, fine-tuning them in two annotated corpora containing clinical narratives. Our pre-trained models outperformed previous multilingual and Portuguese BERT-based models for cardiology and multi-specialty environments, reaching the state-of-the-art for analyzed corpora, with 5.5% F1 score improvement in TempClinBr (all entities) and 1.7% in SemClinBr (Disorder entity) corpora. Hence, we demonstrate that data representativeness and a high volume of training data can improve the results for clinical tasks, aligned with results for other languages.
引用
收藏
页码:378 / 381
页数:4
相关论文
共 50 条
  • [41] LDTR: Transformer-based lane detection with anchor-chain representation
    Yang, Zhongyu
    Shen, Chen
    Shao, Wei
    Xing, Tengfei
    Hu, Runbo
    Xu, Pengfei
    Chai, Hua
    Xue, Ruini
    [J]. COMPUTATIONAL VISUAL MEDIA, 2024, 10 (04) : 753 - 769
  • [42] TRANSFORMER LANGUAGE MODELS WITH LSTM-BASED CROSS-UTTERANCE INFORMATION REPRESENTATION
    Sun, G.
    Zhang, C.
    Woodland, P. C.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7363 - 7367
  • [43] EpiGePT: a pretrained transformer-based language model for context-specific human epigenomics
    Gao, Zijing
    Liu, Qiao
    Zeng, Wanwen
    Jiang, Rui
    Wong, Wing Hung
    [J]. GENOME BIOLOGY, 2024, 25 (01):
  • [44] Characterization of MPC-based Private Inference for Transformer-based Models
    Wang, Yongqin
    Edward, G.
    Xiong, Wenjie
    Lefaudeux, Benjamin
    Knott, Brian
    Annavaram, Murali
    Lee, Hsien-Hsin S.
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2022), 2022, : 187 - 197
  • [45] On the robustness of arabic aspect-based sentiment analysis: A comprehensive exploration of transformer-based models
    Almasaud, Alanod
    Al-Baity, Heyam H.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (10)
  • [46] Classification and recognition of gesture EEG signals with Transformer-Based models
    Qu, Yan
    Li, Congsheng
    Jiang, Haoyu
    [J]. 2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 415 - 418
  • [47] Explaining transformer-based image captioning models: An empirical analysis
    Cornia, Marcella
    Baraldi, Lorenzo
    Cucchiara, Rita
    [J]. AI COMMUNICATIONS, 2022, 35 (02) : 111 - 129
  • [48] A Comprehensive Review of Transformer-Based Models: ChatGPT and Bard in Focus
    Illangarathne, Pooja
    Jayasinghe, Nethari
    de Lima, A. B. Duweeja
    [J]. 2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 543 - 554
  • [49] Transformer-Based Language-Person Search With Multiple Region Slicing
    Li, Hui
    Xiao, Jimin
    Sun, Mingjie
    Lim, Eng Gee
    Zhao, Yao
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (03) : 1624 - 1633
  • [50] Efficient Transformer-based Knowledge Tracing for a Personalized Language Education Application
    Kim, Dae-Eun
    Hong, Changki
    Kim, Woo-Hyun
    [J]. PROCEEDINGS OF THE TENTH ACM CONFERENCE ON LEARNING @ SCALE, L@S 2023, 2023, : 336 - 340