Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese

被引:3
|
作者
de Lima Santos, Diego Bernardes [1 ]
de Carvalho Dutra, Frederico Giffoni [2 ]
Parreiras, Fernando Silva [3 ]
Brandao, Wladmir Cardoso [1 ]
机构
[1] Pontifical Catholic Univ Minas Gerais PUC Minas, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Co Energet Minas Gerais CEMIG, Belo Horizonte, MG, Brazil
[3] FUMEC Univ, Lab Adv Informat Syst, Belo Horizonte, MG, Brazil
关键词
Named Entity Recognition; Text Embedding; Neural Network; Transformer; Multilingual; Portuguese; MODELS;
D O I
10.5220/0010443204730483
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent state of the art named entity recognition approaches are based on deep neural networks that use an attention mechanism to learn how to perform the extraction of named entities from relevant fragments of text. Usually, training models in a specific language leads to effective recognition, but it requires a lot of time and computational resources. However, fine-tuning a pre-trained multilingual model can be simpler and faster, but there is a question on how effective that recognition model can be. This article exploits multilingual models for named entity recognition by adapting and training tranformer-based architectures for Portuguese, a challenging complex language. Experimental results show that multilingual trasformer-based text embeddings approaches fine tuned with a large dataset outperforms state of the art trasformer-based models trained specifically for Portuguese. In particular, we build a comprehensive dataset from different versions of HAREM to train our multilingual transformer-based text embedding approach, which achieves 88.0% of precision and 87.8% in F1 in named entity recognition for Portuguese, with gains of up to 9.89% of precision and 11.60% in F1 compared to the state of the art single-lingual approach trained specifically for Portuguese.
引用
收藏
页码:473 / 483
页数:11
相关论文
共 50 条
  • [1] Combining Word Embeddings for Portuguese Named Entity Recognition
    da Silva, Messias Gomes
    Alves de Oliveira, Hilario Tomaz
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 198 - 208
  • [2] Embeddings for Named Entity Recognition in Geoscience Portuguese Literature
    Consoli, Bernardo
    Santos, Joaquim
    Gomes, Diogo
    Cordeiro, Fabio
    Vieira, Renata
    Moreira, Viviane
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4625 - 4630
  • [3] Transformer-based approach for joint handwriting and named entity recognition in historical document
    Rouhou, Ahmed Cheikh
    Dhiaf, Marwa
    Kessentini, Yousri
    Ben Salem, Sinda
    PATTERN RECOGNITION LETTERS, 2022, 155 : 128 - 134
  • [4] Named Entity Recognition in Cyber Threat Intelligence Using Transformer-based Models
    Evangelatos, Pavlos
    Iliou, Christos
    Mavropoulos, Thanassis
    Apostolou, Konstantinos
    Tsikrika, Theodora
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (IEEE CSR), 2021, : 348 - 353
  • [5] Transformer-Based Named Entity Recognition for Parsing Clinical Trial Eligibility Criteria
    Tian, Shubo
    Erdengasileng, Arslan
    Yang, Xi
    Guo, Yi
    Wu, Yonghui
    Zhang, Jinfeng
    Bian, Jiang
    He, Zhe
    12TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS (ACM-BCB 2021), 2021,
  • [6] Practical Transformer-based Multilingual Text Classification
    Wang, Cindy
    Banko, Michele
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 121 - 129
  • [7] Multilingual Named Entity Recognition Using Pretrained Embeddings, Attention Mechanism and NCRF
    Emelyanov, Anton A.
    Artemova, Ekaterina
    7TH WORKSHOP ON BALTO-SLAVIC NATURAL LANGUAGE PROCESSING (BSNLP'2019), 2019, : 94 - 99
  • [8] Theoretical Linguistics Rivals Embeddings in Language Clustering for Multilingual Named Entity Recognition
    Imai, Sakura
    Kawahara, Daisuke
    Orita, Naho
    Oda, Hiromune
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 139 - 151
  • [9] Named Entity Recognition in Portuguese Neurology Text Using CRF
    Lopes, Fabio
    Teixeira, Cesar
    Oliveira, Hugo Goncalo
    PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2019, PT I, 2019, 11804 : 336 - 348
  • [10] Enhanced Chinese Named Entity Recognition with Transformer-Based Multi-feature Fusion
    Zhang, Xiaoli
    Zhang, Quan
    Liang, Kun
    Wang, Haoyu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14864 : 132 - 141