Multidimensional Affective Analysis for Low-Resource Languages: A Use Case with Guarani-Spanish Code-Switching Language

被引：4

作者：

Aguero-Torales, Marvin M. ^{[1
,3
]}

Lopez-Herrera, Antonio G. ^{[1
]}

Vilares, David ^{[2
]}

机构：

[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, Calle Daniel Saucedo Aranda S-N, Granada 18071, Granada, Spain

[2] Univ A Coruna, Dept Comp Sci & Informat Technol, CITIC, Campus Elvina S-N, La Coruna 15008, A Coruna, Spain

[3] Global CoE Data Intelligence, Camino Cerro Gamos 1, Madrid 28224, Spain

来源：

COGNITIVE COMPUTATION | 2023年 / 15卷 / 04期

基金：

欧洲研究理事会;

关键词：

Natural language processing; Sentiment analysis; Affective analysis; Code-switching; Low-resource languages; SENTIMENT ANALYSIS;

D O I：

10.1007/s12559-023-10165-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper focuses on text-based affective computing for Jopara, a code-switching language that combines Guarani and Spanish. First, we collected a dataset of tweets primarily written in Guarani and annotated them for three widely used dimensions in sentiment analysis: (a) emotion recognition, (b) humor detection, and (c) offensive language identification. Then, we developed several neural network models, including large language models specifically designed for Guarani, and compared their performance against off-the-shelf multilingual and Spanish pre-trained models for the aforementioned dimensions. Our experiments show that language models incorporating Guarani during pre-training or pre-fine-tuning consistently achieve the best results, despite limited resources (a single 24-GB GPU and only 800K tokens). Notably, even a Guarani BERT model with just two layers of Transformers shows a favorable balance between accuracy and computational power, likely due to the inherent low-resource nature of the task. We present a comprehensive overview of corpus creation and model development for low-resource languages like Guarani, particularly in the context of its code-switching with Spanish, resulting in Jopara. Our findings shed light on the challenges and strategies involved in analyzing affective language in such linguistic contexts.

引用

页码：1391 / 1406

页数：16

共 93 条

[1] Using Tweets and Emojis to Build TEAD: an Arabic Dataset for Sentiment Analysis
Abdellaoui, Houssem
Zrigui, Mounir
[J]. COMPUTACION Y SISTEMAS, 2018, 22 (03): : 777 - 786
[2] MasakhaNER: Named Entity Recognition for African Languages
Adelani, David Ifeoluwa
Abbott, Jade
Neubig, Graham
D'souza, Daniel
Kreutzer, Julia
Lignos, Constantine
Palen-Michel, Chester
Buzaaba, Happy
Rijhwani, Shruti
Ruder, Sebastian
Mayhew, Stephen
Azime, Israel Abebe
Muhammad, Shamsuddeen H.
Emezue, Chris Chinenye
Nakatumba-Nabende, Joyce
Ogayo, Perez
Anuoluwapo, Aremu
Gitau, Catherine
Mbaye, Derguene
Alabi, Jesujoba
Yimam, Seid Muhie
Gwadabe, Tajuddeen Rabiu
Ezeani, Ignatius
Niyongabo, Rubungo Andre
Mukiibi, Jonathan
Otiende, Verrah
Orife, Iroro
David, Davis
Ngom, Samba
Adewumi, Tosin
Rayson, Paul
Adeyemi, Mofetoluwa
Muriuki, Gerald
Anebi, Emmanuel
Chukwuneke, Chiamaka
Odu, Nkiruka
Wairagala, Eric Peter
Oyerinde, Samuel
Siro, Clemencia
Bateesa, Tobius Saul
Oloyede, Temilola
Wambui, Yvonne
Akinode, Victor
Nabagereka, Deborah
Katusiime, Maurice
Awokoya, Ayodele
Mboup, Mouhamadane
Gebreyohannes, Dibora
Tilaye, Henok
Nwaike, Kelechi
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 1116 - 1131
[3] Twitter Sentiment Analysis Approaches: A Survey
Adwan, Omar Y.
Al-Tawil, Marwan
Huneiti, Ammar M.
Shahin, Rawan A.
Abu Zayed, Abeer A.
Al-Dibsi, Razan H.
[J]. INTERNATIONAL JOURNAL OF EMERGING TECHNOLOGIES IN LEARNING, 2020, 15 (15) : 79 - 93
[4] Agerri R, 2020, P 12 LANGUAGE RESOUR
[5] Aguero-Torales MM, 2021, P 5 WORKSHOP COMPUTA, P95, DOI DOI 10.18653/V1/2021.CALCS-1.12
[6] Aguero-Torales MM, 2022, MACHINE LEARNING APP
[7] Inter-Coder Agreement for Computational Linguistics
Artstein, Ron
Poesio, Massimo
[J]. COMPUTATIONAL LINGUISTICS, 2008, 34 (04) : 555 - 596
[8] Asgari E, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P4113
[9] Attardi Giusepppe., Wikiextractor
[10] XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Babu, Arun
Wang, Changhan
Tjandra, Andros
Lakhotia, Kushal
Xu, Qiantong
Goyal, Naman
Singh, Kritika
von Platen, Patrick
Saraf, Yatharth
Pino, Juan
Baevski, Alexei
Conneau, Alexis
Auli, Michael
[J]. INTERSPEECH 2022, 2022, : 2278 - 2282

← 1 2 3 4 5 6 7 8 9 10 →