A proposal to identify stakeholders from news for the institutional relationship management activities of an institution based on Named Entity Recognition using BERT
被引:1
|
作者:
Messias da Silva, Eric Hans
论文数: 0引用数: 0
h-index: 0
机构:
Univ Brasilia, Programa Posgrad Comp Aplicada, Brasilia, DF, BrazilUniv Brasilia, Programa Posgrad Comp Aplicada, Brasilia, DF, Brazil
Messias da Silva, Eric Hans
[1
]
Laterza, Joao
论文数: 0引用数: 0
h-index: 0
机构:
Univ Brasilia, Programa Posgrad Comp Aplicada, Brasilia, DF, BrazilUniv Brasilia, Programa Posgrad Comp Aplicada, Brasilia, DF, Brazil
Laterza, Joao
[1
]
Pereira da Silva, Marcos Paulo
论文数: 0引用数: 0
h-index: 0
机构:
Univ Brasilia, Programa Posgrad Comp Aplicada, Brasilia, DF, BrazilUniv Brasilia, Programa Posgrad Comp Aplicada, Brasilia, DF, Brazil
Pereira da Silva, Marcos Paulo
[1
]
论文数: 引用数:
h-index:
机构:
Ladeira, Marcelo
[1
]
机构:
[1] Univ Brasilia, Programa Posgrad Comp Aplicada, Brasilia, DF, Brazil
来源:
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021)
|
2021年
关键词:
transfer learning;
fine tuning;
pre-trained language models;
BERT Portuguese;
BERT Multilingual;
D O I:
10.1109/ICMLA52953.2021.00251
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
For an organization's institutional relationship activities, it is strategic that there is an efficient process of identification and characterization of stakeholders based on available information. Given the increasing volume of data currently available, this strategic process has commonly been supported by information technology solutions, with high potential for the use of data mining techniques such as textual analysis and natural language processing (NLP). In this work we analyzed the possibility of using a mechanism of Named Entity Recognition (NER) based on the use of Bidirectional Encoder Representations from Transformers (BERT) with Conditional Random Field (CRF), which in the future can be used as the stakeholder identification solution as a replacement of the rule based identification. We applied the proposed solution in news dataset to evaluate its performance. The experiment results showed us that pre-trained Portuguese models performed better than Multilingual ones by a good margin of at least 3.43 percentage points on Test Dataset. We also added a post processing Prediction Masking to correct invalid tagging scheme transitions to improve Micro Fl Score in both datasets ranging from 0.38 percentage points to 1.29 percentage points of improvement. Thus, we achieved the objective of improving stakeholder detection by proposing a NER model that far surpasses the naive rules-based approach of current application, which consisted of an exact text match of stakeholders based on a dictionary built manually.