Using Large Language Models to Improve Sentiment Analysis in Latvian Language

被引:1
作者
Purvins, Pauls [1 ]
Urtans, Evalds [2 ]
Caune, Vairis [2 ]
机构
[1] Univ Latvia, Riga, Latvia
[2] Ventspils Univ Appl Sci, Dept Comp Sci, Ventspils, Latvia
来源
BALTIC JOURNAL OF MODERN COMPUTING | 2024年 / 12卷 / 02期
关键词
Large Language Models; Sentiment Analysis; Dataset creation; Latvian Language; Deep Learning; ChatGPT; Prompt Engineering;
D O I
10.22364/bjmc.2024.12.2.03
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This empirical study explores the use of large language models (LLMs) in sentiment analysis and presents a new approach to creating a dataset in Latvian language using Reddit data. Using prompt engineering for the GPT-3.5-turbo model (latest at the time of writing), we achieved 82% accuracy that exceeds previous research on Latvian Tweet Sentiment Corpus by 50% in three class sentiment classification. We also demonstrate that LLMs can partially replace human labelers, making data set creation more cost-effective, especially for larger datasets. This work contributes to sentiment analysis in non-English languages, leveraging the power of LLMs. The paper introduces a new LVReddit dataset that contains more than 90000 samples, making it the largest available sentiment dataset for the Latvian language. Our findings confirm the LLM's underlying "understanding" of language. However, LLMs occasionally deviate from response templates, making parsing challenging. Future research should investigate fine-tuned models based on novel datasets and analyze language patterns.
引用
收藏
页码:165 / 175
页数:11
相关论文
共 22 条
[1]  
[Anonymous], 2013, P 2013 C EMP METH NA, DOI DOI 10.1371/JOURNAL.PONE.0073791
[2]  
[Anonymous], 2011, P 49 ANN M ASS COMP
[3]   Normalization and Automatized Sentiment Analysis of Contemporary Online Latvian Language [J].
Garkaje, Ginta ;
Zilgalve, Evelina ;
Dargis, Roberts .
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2014, 2014, 268 :83-86
[4]  
Gedin s K., 2013, Automatiska teksta emocionalas noskan,as noteiksana latviesu valoda
[5]  
Gulbinskis I., 2010, Digitalo tekstu sentimenta analize
[6]   Sentiment Analysis of Lithuanian Texts Using Traditional and Deep Learning Approaches [J].
Kapociute-Dzikiene, Jurgita ;
Damasevicius, Robertas ;
Wozniak, Marcin .
COMPUTERS, 2019, 8 (01)
[7]  
Kojima T, 2022, Arxiv, DOI [arXiv:2205.11916, DOI 10.48550/ARXIV.2205.11916]
[8]  
Nicmanis D., 2017, Sabiedribas attieksmes modelesana, izmantojot sentimenta analizi
[9]   IDENTIFYING POLARITY IN DIFFERENT TEXT TYPES [J].
Pajupuu, Hille ;
Altrov, Rene ;
Pajupuu, Jaan .
FOLKLORE-ELECTRONIC JOURNAL OF FOLKLORE, 2016, (64) :125-142
[10]   Uses of Machine Translation in the Sentiment Analysis of Tweets [J].
Peisenieks, Janis ;
Skadins, Raivis .
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2014, 2014, 268 :126-131