Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese

被引:3
|
作者
de Lima Santos, Diego Bernardes [1 ]
de Carvalho Dutra, Frederico Giffoni [2 ]
Parreiras, Fernando Silva [3 ]
Brandao, Wladmir Cardoso [1 ]
机构
[1] Pontifical Catholic Univ Minas Gerais PUC Minas, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Co Energet Minas Gerais CEMIG, Belo Horizonte, MG, Brazil
[3] FUMEC Univ, Lab Adv Informat Syst, Belo Horizonte, MG, Brazil
来源
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1 | 2021年
关键词
Named Entity Recognition; Text Embedding; Neural Network; Transformer; Multilingual; Portuguese; MODELS;
D O I
10.5220/0010443204730483
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent state of the art named entity recognition approaches are based on deep neural networks that use an attention mechanism to learn how to perform the extraction of named entities from relevant fragments of text. Usually, training models in a specific language leads to effective recognition, but it requires a lot of time and computational resources. However, fine-tuning a pre-trained multilingual model can be simpler and faster, but there is a question on how effective that recognition model can be. This article exploits multilingual models for named entity recognition by adapting and training tranformer-based architectures for Portuguese, a challenging complex language. Experimental results show that multilingual trasformer-based text embeddings approaches fine tuned with a large dataset outperforms state of the art trasformer-based models trained specifically for Portuguese. In particular, we build a comprehensive dataset from different versions of HAREM to train our multilingual transformer-based text embedding approach, which achieves 88.0% of precision and 87.8% in F1 in named entity recognition for Portuguese, with gains of up to 9.89% of precision and 11.60% in F1 compared to the state of the art single-lingual approach trained specifically for Portuguese.
引用
收藏
页码:473 / 483
页数:11
相关论文
共 50 条
  • [31] Named Entity Recognition using Knowledge Graph Embeddings and DistilBERT
    Mehta, Shreyansh
    Radke, Mansi A.
    Sunkle, Sagar
    2021 5TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2021, 2021, : 146 - 150
  • [32] Transformer Optimization and Application in Named Entity Recognition of Apple Diseases and Pests
    Pu P.
    Zhang Y.
    Liu Y.
    Nie Y.
    Huang L.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2023, 54 (06): : 266 - 271
  • [33] Named Entity Recognition of Power Substation Knowledge Based on Transformer-BiLSTM-CRF Network
    Yang, Q. Y.
    Jiang, J.
    Feng, X. Y.
    He, J. M.
    Chen, B. R.
    Zhang, Z. Y.
    2020 INTERNATIONAL CONFERENCE ON SMART GRIDS AND ENERGY SYSTEMS (SGES 2020), 2020, : 952 - 956
  • [34] Cimind: A phonetic-based tool for multilingual named entity recognition in biomedical texts
    Cabot, Chloe
    Darmoni, Stefan
    Soualmia, Lina F.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 94
  • [35] Multilingual named entity recognition based on the BiGRU-CNN-CRF hybrid model
    Ayifu M.
    Wushouer S.
    Palidan M.
    International Journal of Information and Communication Technology, 2019, 15 (03) : 223 - 242
  • [36] A Hybrid Named Entity Recognition System for Aviation Text
    Bharathi, A.
    Ramdin, Robin
    Babu, Preeja
    Menon, Vijay Krishna
    Jayaramakrishnan, Chandrasekhar
    Lakshmikumar, Sudarsan
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (01)
  • [37] Named Entity Recognition in Unstructured Medical Text Documents
    Pearson, Cole
    Seliya, Naeem
    Dave, Rushit
    INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ENERGY TECHNOLOGIES (ICECET 2021), 2021, : 412 - 417
  • [38] Named Entity Recognition Method for Process Planning Text
    Dong H.
    Li Y.
    Qiao L.
    Huang Z.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2024, 36 (02): : 313 - 320
  • [39] Transformer-based Cross-Lingual Summarization using Multilingual Word Embeddings for English - Bahasa Indonesia
    Abka, Achmad F.
    Azizah, Kurniawati
    Jatmiko, Wisnu
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 636 - 645
  • [40] Portuguese Named Entity Recognition Using LSTM-CRF
    Quinta de Castro, Pedro Vitor
    Felipe da Silva, Nadia Felix
    Soares, Anderson da Silva
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 83 - 92