Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese

被引:4
作者
de Lima Santos, Diego Bernardes [1 ]
de Carvalho Dutra, Frederico Giffoni [2 ]
Parreiras, Fernando Silva [3 ]
Brandao, Wladmir Cardoso [1 ]
机构
[1] Pontifical Catholic Univ Minas Gerais PUC Minas, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Co Energet Minas Gerais CEMIG, Belo Horizonte, MG, Brazil
[3] FUMEC Univ, Lab Adv Informat Syst, Belo Horizonte, MG, Brazil
来源
PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1 | 2021年
关键词
Named Entity Recognition; Text Embedding; Neural Network; Transformer; Multilingual; Portuguese; MODELS;
D O I
10.5220/0010443204730483
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent state of the art named entity recognition approaches are based on deep neural networks that use an attention mechanism to learn how to perform the extraction of named entities from relevant fragments of text. Usually, training models in a specific language leads to effective recognition, but it requires a lot of time and computational resources. However, fine-tuning a pre-trained multilingual model can be simpler and faster, but there is a question on how effective that recognition model can be. This article exploits multilingual models for named entity recognition by adapting and training tranformer-based architectures for Portuguese, a challenging complex language. Experimental results show that multilingual trasformer-based text embeddings approaches fine tuned with a large dataset outperforms state of the art trasformer-based models trained specifically for Portuguese. In particular, we build a comprehensive dataset from different versions of HAREM to train our multilingual transformer-based text embedding approach, which achieves 88.0% of precision and 87.8% in F1 in named entity recognition for Portuguese, with gains of up to 9.89% of precision and 11.60% in F1 compared to the state of the art single-lingual approach trained specifically for Portuguese.
引用
收藏
页码:473 / 483
页数:11
相关论文
共 36 条
[31]  
Souza Fabio, 2019, CoRR
[32]  
Vaswani A, 2017, ADV NEUR IN, V30
[33]  
Wolf T, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, P38
[34]  
Xun GX, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4207
[35]  
Yang Z., 2019, CORR, V1906
[36]   Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books [J].
Zhu, Yukun ;
Kiros, Ryan ;
Zemel, Richard ;
Salakhutdinov, Ruslan ;
Urtasun, Raquel ;
Torralba, Antonio ;
Fidler, Sanja .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :19-27