LegalEc: A New Corpus for Complex Word Identification Research in Law Studies in Ecuatorian Spanish

被引:0
作者
Ortiz-Zambrano, Jenny A. [1 ]
Espin-Riofrio, Cesar [1 ]
Montejo-Raez, Arturo [2 ]
机构
[1] Univ Guayaquil, Guayaquil, Ecuador
[2] Univ Jaen, Jaen 23071, Spain
来源
PROCESAMIENTO DEL LENGUAJE NATURAL | 2023年 / 71期
关键词
Lexical complexity; feature integration; corpus generation; legal language; Spanish;
D O I
10.26342/2023-71-19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present LegalEc, a new annotated corpus of complex lexis constructed from legal texts in Ecuadorian Spanish. We detail its compilation and annotation process. In order to provide a resource for the scientific community to continue research in the area of Lexical Simplification in the Spanish language, several complex word prediction experiments have been carried out on this corpus. We extracted 23 linguistic features which we combined with the encodings generated by models such as XLM-RoBERTa and RoBERTa-BNE (from the MarIA project). The evaluation shows that the combination of these features improves the prediction of lexical complexity.
引用
收藏
页码:247 / 259
页数:13
相关论文
共 36 条
[1]  
Alarcon R., 2020, IBERLEF SEPLN, P24
[2]   UMUCorpusClassifier: Compilation and evaluation of linguistic corpus for Natural Language Processing tasks [J].
Antonio Garcia-Diaz, Jose ;
Almela, Angela ;
Alcaraz-Marmol, Gema ;
Valencia-Garcia, Rafael .
PROCESAMIENTO DEL LENGUAJE NATURAL, 2020, (65) :139-142
[3]  
Anula A, 2008, LE L, V2, P162
[4]  
Cabrera-Melendez J. L., 2022, Ethnobotany Research and Applications, V24
[5]  
Camposa R. A., 2020, Revista Tecnologia en Marcha, pag
[6]   Predicting the proficiency level of language learners using lexical indices [J].
Crossley, Scott A. ;
Salsbury, Tom ;
McNamara, Danielle S. .
LANGUAGE TESTING, 2012, 29 (02) :243-263
[7]  
Davidson S, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P7238
[8]  
Desai A.T., 2021, P 15 INT WORKSH SEM, P548, DOI 10.18653/v1/2021.semeval-1.67
[9]   How-to Bureaucracy: A Concept of Citizens' Administrative Literacy [J].
Doring, Matthias .
ADMINISTRATION & SOCIETY, 2021, 53 (08) :1155-1177
[10]  
Mosquera A., 2021, P 15 INT WORKSH SEM, P554, DOI 10.18653/v1/2021.semeval-1.68