CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese

被引:4
作者
Rubel Schneider, Elisa Terumi [1 ]
Gumiel, Yohan Bonescki [2 ]
Andrioli de Souza, Joao Vitor [3 ]
Mukai, Lilian Mie [2 ]
Silva e Oliveira, Lucas Emanuel [3 ]
Rebelo, Marina de Sa [4 ]
Gutierrez, Marco Antonio [4 ]
Krieger, Jose Eduardo [4 ]
Teodoro, Douglas [5 ]
Moro, Claudia [1 ]
Paraiso, Emerson Cabrera [1 ]
机构
[1] Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil
[2] Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil
[3] Comsentimento, Curitiba, Parana, Brazil
[4] HC FMUSP, InCor, Inst Heart, Sao Paulo, Brazil
[5] Univ Geneva, Geneva, Switzerland
来源
2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS | 2023年
关键词
natural language processing; transformer; clinical texts; language model;
D O I
10.1109/CBMS58004.2023.00247
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextual word embeddings and the Transformers architecture have reached state-of-the-art results in many natural language processing (NLP) tasks and improved the adaptation of models for multiple domains. Despite the improvement in the reuse and construction of models, few resources are still developed for the Portuguese language, especially in the health domain. Furthermore, the clinical models available for the language are not representative enough for all medical specialties. This work explores deep contextual embedding models for the Portuguese language to support clinical NLP tasks. We transferred learned information from electronic health records of a Brazilian tertiary hospital specialized in cardiology diseases and pre-trained multiple clinical BERT-based models. We evaluated the performance of these models in named entity recognition experiments, fine-tuning them in two annotated corpora containing clinical narratives. Our pre-trained models outperformed previous multilingual and Portuguese BERT-based models for cardiology and multi-specialty environments, reaching the state-of-the-art for analyzed corpora, with 5.5% F1 score improvement in TempClinBr (all entities) and 1.7% in SemClinBr (Disorder entity) corpora. Hence, we demonstrate that data representativeness and a high volume of training data can improve the results for clinical tasks, aligned with results for other languages.
引用
收藏
页码:378 / 381
页数:4
相关论文
共 50 条
  • [31] Transformer-based Models for Arabic Online Handwriting Recognition
    Alwajih, Fakhraddin
    Badr, Eman
    Abdou, Sherif
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 898 - 905
  • [32] Blockwise compression of transformer-based models without retraining
    Dong, Gaochen
    Chen, W.
    NEURAL NETWORKS, 2024, 171 : 423 - 428
  • [33] Automatic Question Generation using RNN-based and Pre-trained Transformer-based Models in Low Resource Indonesian Language
    Vincentio, Karissa
    Suhartono, Derwin
    INFORMATICA-AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS, 2022, 46 (07): : 103 - 118
  • [34] Smart Home Notifications in Croatian Language: A Transformer-Based Approach
    Simunec, Magdalena
    Soic, Renato
    2023 17TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS, CONTEL, 2023,
  • [35] AN EMPIRICAL STUDY OF TRANSFORMER-BASED NEURAL LANGUAGE MODEL ADAPTATION
    Li, Ke
    Liu, Zhe
    He, Tianxing
    Huang, Hongzhao
    Peng, Fuchun
    Povey, Daniel
    Khudanpur, Sanjeev
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7934 - 7938
  • [36] Vision transformer-based visual language understanding of the construction process
    Yang, Bin
    Zhang, Binghan
    Han, Yilong
    Liu, Boda
    Hu, Jiniming
    Jin, Yiming
    ALEXANDRIA ENGINEERING JOURNAL, 2024, 99 : 242 - 256
  • [37] Transformer-Based Single-Cell Language Model: A Survey
    Lan, Wei
    He, Guohang
    Liu, Mingyang
    Chen, Qingfeng
    Cao, Junyue
    Peng, Wei
    BIG DATA MINING AND ANALYTICS, 2024, 7 (04): : 1169 - 1186
  • [38] Efficient Open Domain Question Answering With Delayed Attention in Transformer-Based Models
    Siblini, Wissam
    Challal, Mohamed
    Pasqual, Charlotte
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2022, 18 (02)
  • [39] Identification of Dietary Supplement Use from Electronic Health Records Using Transformer-based Language Models
    Zhou, Sicheng
    Schutte, Dalton
    Xing, Aiwen
    Chen, Jiyang
    Wolfson, Julian
    He, Zhe
    Yu, Fang
    Zhang, Rui
    2021 IEEE 9TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2021), 2021, : 513 - 514
  • [40] Identification and Visualization of Key Topics in Scientific Publications with Transformer-Based Language Models and Document Clustering Methods
    Weng, Min-Hsien
    Wu, Shaoqun
    Dyer, Mark
    APPLIED SCIENCES-BASEL, 2022, 12 (21):