CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese

被引:4
作者
Rubel Schneider, Elisa Terumi [1 ]
Gumiel, Yohan Bonescki [2 ]
Andrioli de Souza, Joao Vitor [3 ]
Mukai, Lilian Mie [2 ]
Silva e Oliveira, Lucas Emanuel [3 ]
Rebelo, Marina de Sa [4 ]
Gutierrez, Marco Antonio [4 ]
Krieger, Jose Eduardo [4 ]
Teodoro, Douglas [5 ]
Moro, Claudia [1 ]
Paraiso, Emerson Cabrera [1 ]
机构
[1] Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil
[2] Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil
[3] Comsentimento, Curitiba, Parana, Brazil
[4] HC FMUSP, InCor, Inst Heart, Sao Paulo, Brazil
[5] Univ Geneva, Geneva, Switzerland
来源
2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS | 2023年
关键词
natural language processing; transformer; clinical texts; language model;
D O I
10.1109/CBMS58004.2023.00247
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextual word embeddings and the Transformers architecture have reached state-of-the-art results in many natural language processing (NLP) tasks and improved the adaptation of models for multiple domains. Despite the improvement in the reuse and construction of models, few resources are still developed for the Portuguese language, especially in the health domain. Furthermore, the clinical models available for the language are not representative enough for all medical specialties. This work explores deep contextual embedding models for the Portuguese language to support clinical NLP tasks. We transferred learned information from electronic health records of a Brazilian tertiary hospital specialized in cardiology diseases and pre-trained multiple clinical BERT-based models. We evaluated the performance of these models in named entity recognition experiments, fine-tuning them in two annotated corpora containing clinical narratives. Our pre-trained models outperformed previous multilingual and Portuguese BERT-based models for cardiology and multi-specialty environments, reaching the state-of-the-art for analyzed corpora, with 5.5% F1 score improvement in TempClinBr (all entities) and 1.7% in SemClinBr (Disorder entity) corpora. Hence, we demonstrate that data representativeness and a high volume of training data can improve the results for clinical tasks, aligned with results for other languages.
引用
收藏
页码:378 / 381
页数:4
相关论文
共 50 条
  • [21] Adapting transformer-based language models for heart disease detection and risk factors extraction
    Houssein, Essam H.
    Mohamed, Rehab E.
    Hu, Gang
    Ali, Abdelmgeid A.
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [22] Adapting transformer-based language models for heart disease detection and risk factors extraction
    Essam H. Houssein
    Rehab E. Mohamed
    Gang Hu
    Abdelmgeid A. Ali
    Journal of Big Data, 11
  • [23] LightSeq2: Accelerated Training for Transformer-Based Models on GPUs
    Wang, Xiaohui
    Wei, Yang
    Xiong, Ying
    Huang, Guyue
    Qian, Xian
    Ding, Yufei
    Wang, Mingxuan
    Li, Lei
    SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [24] A Transformer-based Framework for Multivariate Time Series Representation Learning
    Zerveas, George
    Jayaraman, Srideepika
    Patel, Dhaval
    Bhamidipaty, Anuradha
    Eickhoff, Carsten
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2114 - 2124
  • [25] MuLan-Methyl-multiple transformer-based language models for accurate DNA methylation prediction
    Zeng, Wenhuan
    Gautam, Anupam
    Huson, Daniel H.
    GIGASCIENCE, 2023, 12
  • [26] Enhancing performance of transformer-based models in natural language understanding through word importance embedding
    Hong, Seung-Kyu
    Jang, Jae-Seok
    Kwon, Hyuk-Yoon
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [27] Enhancing Transformer-based language models with commonsense representations for knowledge-driven machine comprehension
    Li, Ronghan
    Jiang, Zejun
    Wang, Lifang
    Lu, Xinyu
    Zhao, Meng
    Chen, Daqing
    KNOWLEDGE-BASED SYSTEMS, 2021, 220
  • [28] MuLan-Methyl-multiple transformer-based language models for accurate DNA methylation prediction
    Zeng, Wenhuan
    Gautam, Anupam
    Huson, Daniel H.
    GIGASCIENCE, 2023, 12
  • [29] A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models
    Zhang, Hanqing
    Song, Haolin
    Li, Shaoyu
    Zhou, Ming
    Song, Dawei
    ACM COMPUTING SURVEYS, 2024, 56 (03)
  • [30] MuLan-Methyl-multiple transformer-based language models for accurate DNA methylation prediction
    Zeng, Wenhuan
    Gautam, Anupam
    Huson, Daniel H.
    GIGASCIENCE, 2023, 12