LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text

被引:47
作者
Luz de Araujo, Pedro Henrique [1 ]
de Campos, Teofilo E. [2 ]
de Oliveira, Renato R. R. [1 ]
Stauffer, Matheus [1 ]
Couto, Samuel [1 ]
Bermejo, Paulo [1 ]
机构
[1] Univ Brasilia UnB, R&D Ctr Excellence & Publ Sect Transformat NEXT, Brasilia, DF, Brazil
[2] Univ Brasilia, Dept Comp Sci, Brasilia, DF, Brazil
来源
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018 | 2018年 / 11122卷
关键词
Named entity recognition; Natural language processing; Portuguese processing;
D O I
10.1007/978-3-319-99722-3_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named entity recognition systems have the untapped potential to extract information from legal documents, which can improve information retrieval and decision-making processes. In this paper, a dataset for named entity recognition in Brazilian legal documents is presented. Unlike other Portuguese language datasets, this dataset is composed entirely of legal documents. In addition to tags for persons, locations, time entities and organizations, the dataset contains specific tags for law and legal cases entities. To establish a set of baseline results, we first performed experiments on another Portuguese dataset: Paramopama. This evaluation demonstrate that LSTM-CRF gives results that are significantly better than those previously reported. We then retrained LSTM-CRF, on our dataset and obtained F-1 scores of 97.04% and 88.82% for Legislation and Legal case entities, respectively. These results show the viability of the proposed dataset for legal applications.
引用
收藏
页码:313 / 323
页数:11
相关论文
共 24 条
[1]  
[Anonymous], 2007, LOAIT
[2]  
[Anonymous], 2001, P INT C MACH LEARN I
[3]  
Bird S, 2009, Natural language processing with python, DOI DOI 10.5555/1717171
[4]  
Cardellino C., 2017, P 16 ED INT C ART IN, P9, DOI DOI 10.1145/3086512.3086514
[5]  
Dagan I, 2006, LECT NOTES ARTIF INT, V3944, P177
[6]  
Dozier C, 2010, LECT NOTES ARTIF INT, V6036, P27, DOI 10.1007/978-3-642-12837-0_2
[7]  
Eckart de Castilho R, 2016, P WORKSHOP LANGUAGE, P76
[8]  
Freitas C., 2010, LANGUAGE RESOURCES E
[9]  
Genthial G., 2017, SEQUENCE TAGGING NAM
[10]   Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J].
Graves, A ;
Schmidhuber, J .
NEURAL NETWORKS, 2005, 18 (5-6) :602-610