Collection and Automatic Analysis with Natural Language Processing on a Corpus of Andean Oral Literature Implemented on the Web

被引:0
作者
Soria Solis, Ivan [1 ]
Castro Buleje, Carlos Yinmel [1 ]
Silvera Reynaga, Humberto [1 ]
Mamani Macedo, Mauro Felix [2 ]
Leon Soncco, Dionicia [2 ]
Mautino Guillen, Alejandro Giancarlo [3 ]
机构
[1] Univ Nacl Jose Maria Arguedas, Andahuaylas, Peru
[2] Univ Nacl Mayor San Marcos, Lima, Peru
[3] Univ Nacl Santiago Antunez Mayolo, Huaraz, Peru
来源
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 4, INTELLISYS 2024 | 2024年 / 1068卷
关键词
Andean oral literature; Word embeddings; FastText; Natural language processing;
D O I
10.1007/978-3-031-66336-9_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Oral literature is transmitted through tradition and is not typically cataloged or recorded. Analyzing this type of literature presents certain challenges as there are limited computer tools available for its collection, preservation and study. In particular, Andean oral literature in Quechua faces significant challenges as there are scarce computer resources to support its study and processing. This study aims to establish a Web-based repository for the development of a corpus comprising texts of Andean oral literature. We utilized this corpus to train a natural language processing model that enables automatic analysis and topic classification of the texts. We applied the FastText tool, which features a pre-trained model for Quechua. Due to the variability in writing that characterizes this language, Word Embeddings were preferred to represent the meaning according to the context, which can handle similar words. The model was evaluated using the corpus dataset in conjunction with the pre-trained vectors. Texts were collected and then trained on a classification model. Metrics of accuracy, precision, recall, and f1 were evaluated. The model was determined to perform best when not utilizing a pre-trained model.
引用
收藏
页码:449 / 463
页数:15
相关论文
共 28 条
[21]  
Sadeeq M.J., 2021, Int. J. Sci. Bus., V5, P148
[22]   Imaginary Beings and Motifs of Oral Literature in Mitos, leyendas y cuentos peruanos by Jose Maria Arguedas and Francisco Izquierdo Rios [J].
Salazar Mejia, Necker .
BOLETIN DE LITERATURA ORAL, 2019, 9 :223-248
[23]   Morphological Skip-Gram: Replacing FastText characters n-gram with morphological knowledge [J].
Santos, Flavio Arthur O. ;
Macedo, Hendrik Teixeira ;
Bispo, Thiago Dias ;
Zanchettin, Cleber .
INTELIGENCIA ARTIFICIAL-IBEROAMERICAN JOURNAL OF ARTIFICIAL INTELLIGENCE, 2021, 24 (67) :1-17
[24]  
Tan S, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, P4153
[25]  
Yang P., 2018, P 27 INT C COMPUTATI, P3915
[26]  
Yao TJ, 2020, PROCEEDINGS OF 2020 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS), P154, DOI [10.1109/ICAIIS49377.2020.9194939, 10.1109/icaiis49377.2020.9194939]
[27]  
Young J.C., 2019, 2019 INT C ENG SCI I, DOI [10.1109/ICESI.2019.886301, DOI 10.1109/ICESI.2019.886301]
[28]  
Zharmagambetov A., 2021, EMNLP 2021, DOI [10.18653/v1/2021.emnlpmain.838, DOI 10.18653/V1/2021.EMNLPMAIN.838]