Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese

被引:3
|
作者
de Lima Santos, Diego Bernardes [1 ]
de Carvalho Dutra, Frederico Giffoni [2 ]
Parreiras, Fernando Silva [3 ]
Brandao, Wladmir Cardoso [1 ]
机构
[1] Pontifical Catholic Univ Minas Gerais PUC Minas, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Co Energet Minas Gerais CEMIG, Belo Horizonte, MG, Brazil
[3] FUMEC Univ, Lab Adv Informat Syst, Belo Horizonte, MG, Brazil
来源
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1 | 2021年
关键词
Named Entity Recognition; Text Embedding; Neural Network; Transformer; Multilingual; Portuguese; MODELS;
D O I
10.5220/0010443204730483
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent state of the art named entity recognition approaches are based on deep neural networks that use an attention mechanism to learn how to perform the extraction of named entities from relevant fragments of text. Usually, training models in a specific language leads to effective recognition, but it requires a lot of time and computational resources. However, fine-tuning a pre-trained multilingual model can be simpler and faster, but there is a question on how effective that recognition model can be. This article exploits multilingual models for named entity recognition by adapting and training tranformer-based architectures for Portuguese, a challenging complex language. Experimental results show that multilingual trasformer-based text embeddings approaches fine tuned with a large dataset outperforms state of the art trasformer-based models trained specifically for Portuguese. In particular, we build a comprehensive dataset from different versions of HAREM to train our multilingual transformer-based text embedding approach, which achieves 88.0% of precision and 87.8% in F1 in named entity recognition for Portuguese, with gains of up to 9.89% of precision and 11.60% in F1 compared to the state of the art single-lingual approach trained specifically for Portuguese.
引用
收藏
页码:473 / 483
页数:11
相关论文
共 50 条
  • [41] LearningToAdapt with word embeddings: Domain adaptation of Named Entity Recognition systems
    Nozza, Debora
    Manchanda, Pikakshi
    Fersini, Elisabetta
    Palmonari, Matteo
    Messina, Enza
    INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (03)
  • [42] Research on the Named Entity Recognition for Rail Fault Text Based on Distant Supervision
    Cai, Yi
    Su, Shuai
    Li, Zheng
    Han, Qinglong
    Zhang, Jianxia
    2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 939 - 944
  • [43] Text Summarization based Named Entity Recognition for Certain Application using BERT
    Tummala, Indira Priyadarshini
    2024 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT CYBER PHYSICAL SYSTEMS AND INTERNET OF THINGS, ICOICI 2024, 2024, : 1136 - 1141
  • [44] An Association Rule Mining Method Based on Named Entity Recognition and Text Classification
    He, Bo
    Zhang, Jiru
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (02) : 1503 - 1511
  • [45] An Association Rule Mining Method Based on Named Entity Recognition and Text Classification
    Bo He
    Jiru Zhang
    Arabian Journal for Science and Engineering, 2023, 48 : 1503 - 1511
  • [46] Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language
    Khairunnisa, Siti Oryza
    Chen, Zhousi
    Komachi, Mamoru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [47] A Feature Based Simple Machine Learning Approach with Word Embeddings to Named Entity Recognition on Tweets
    Taspinar, Mete
    Ganiz, Murat Can
    Acarman, Tankut
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 : 254 - 259
  • [48] TEXT SEGMENTATION USING NAMED ENTITY RECOGNITION AND CO-REFERENCE RESOLUTION
    Fragkou, Pavlina
    ICAART 2011: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2011, : 349 - 354
  • [49] Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network
    Wu, Yonghui
    Jiang, Min
    Lei, Jianbo
    Xu, Hua
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 624 - 628
  • [50] HDCNN-CRF for Biomedical Text Named Entity Recognition
    Gao, Mingyuan
    Wei, Hao
    Chen, Fei
    Qu, Wen
    Lu, Mingyu
    PROCEEDINGS OF 2019 IEEE 10TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2019), 2019, : 191 - 194