CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese

被引:4
|
作者
Rubel Schneider, Elisa Terumi [1 ]
Gumiel, Yohan Bonescki [2 ]
Andrioli de Souza, Joao Vitor [3 ]
Mukai, Lilian Mie [2 ]
Silva e Oliveira, Lucas Emanuel [3 ]
Rebelo, Marina de Sa [4 ]
Gutierrez, Marco Antonio [4 ]
Krieger, Jose Eduardo [4 ]
Teodoro, Douglas [5 ]
Moro, Claudia [1 ]
Paraiso, Emerson Cabrera [1 ]
机构
[1] Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil
[2] Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil
[3] Comsentimento, Curitiba, Parana, Brazil
[4] HC FMUSP, InCor, Inst Heart, Sao Paulo, Brazil
[5] Univ Geneva, Geneva, Switzerland
关键词
natural language processing; transformer; clinical texts; language model;
D O I
10.1109/CBMS58004.2023.00247
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextual word embeddings and the Transformers architecture have reached state-of-the-art results in many natural language processing (NLP) tasks and improved the adaptation of models for multiple domains. Despite the improvement in the reuse and construction of models, few resources are still developed for the Portuguese language, especially in the health domain. Furthermore, the clinical models available for the language are not representative enough for all medical specialties. This work explores deep contextual embedding models for the Portuguese language to support clinical NLP tasks. We transferred learned information from electronic health records of a Brazilian tertiary hospital specialized in cardiology diseases and pre-trained multiple clinical BERT-based models. We evaluated the performance of these models in named entity recognition experiments, fine-tuning them in two annotated corpora containing clinical narratives. Our pre-trained models outperformed previous multilingual and Portuguese BERT-based models for cardiology and multi-specialty environments, reaching the state-of-the-art for analyzed corpora, with 5.5% F1 score improvement in TempClinBr (all entities) and 1.7% in SemClinBr (Disorder entity) corpora. Hence, we demonstrate that data representativeness and a high volume of training data can improve the results for clinical tasks, aligned with results for other languages.
引用
收藏
页码:378 / 381
页数:4
相关论文
共 50 条
  • [41] Transformers-sklearn: a toolkit for medical language understanding with transformer-based models
    Feihong Yang
    Xuwen Wang
    Hetong Ma
    Jiao Li
    BMC Medical Informatics and Decision Making, 21
  • [42] Catching but a glimpse?-Navigating crowdsourced solution spaces with transformer-based language models
    Just, Julian
    Hutter, Katja
    Fueller, Johann
    CREATIVITY AND INNOVATION MANAGEMENT, 2024, 33 (04) : 718 - 741
  • [43] No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
    Kaddour, Jean
    Key, Oscar
    Nawrot, Piotr
    Minervini, Pasquale
    Kusner, Matt J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [44] Transformers-sklearn: a toolkit for medical language understanding with transformer-based models
    Yang, Feihong
    Wang, Xuwen
    Ma, Hetong
    Li, Jiao
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (SUPPL 2)
  • [45] Transformer-Based Music Language Modelling and Transcription
    Zonios, Christos
    Pavlopoulos, John
    Likas, Aristidis
    PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,
  • [46] Transformer-based Natural Language Understanding and Generation
    Zhang, Feng
    An, Gaoyun
    Ruan, Qiuqi
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 281 - 284
  • [47] ASTROMER A transformer-based embedding for the representation of light curves
    Donoso-Oliva, C.
    Becker, I.
    Protopapas, P.
    Cabrera-Vives, G.
    Vishnu, M.
    Vardhan, H.
    ASTRONOMY & ASTROPHYSICS, 2023, 670
  • [48] Transformer-Based Representation Learning on Temporal Heterogeneous Graphs
    Li, Longhai
    Duan, Lei
    Wang, Junchen
    Xie, Guicai
    He, Chengxin
    Chen, Zihao
    Deng, Song
    WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 385 - 400
  • [49] Transformer-based code model with compressed hierarchy representation
    Zhang, Kechi
    Li, Jia
    Li, Zhuo
    Jin, Zhi
    Li, Ge
    EMPIRICAL SOFTWARE ENGINEERING, 2025, 30 (02)
  • [50] Not all quantifiers are equal: Probing transformer-based language models' understanding of generalised quantifiers
    Madusanka, Tharindu
    Zahid, Iqra
    Li, Hao
    Pratt-Hartmann, Ian
    Batista-Navarro, Riza
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 8680 - 8692