Using Large Language Models to Improve Sentiment Analysis in Latvian Language

被引:0
作者
Purvins, Pauls [1 ]
Urtans, Evalds [2 ]
Caune, Vairis [2 ]
机构
[1] Univ Latvia, Riga, Latvia
[2] Ventspils Univ Appl Sci, Dept Comp Sci, Ventspils, Latvia
来源
BALTIC JOURNAL OF MODERN COMPUTING | 2024年 / 12卷 / 02期
关键词
Large Language Models; Sentiment Analysis; Dataset creation; Latvian Language; Deep Learning; ChatGPT; Prompt Engineering;
D O I
10.22364/bjmc.2024.12.2.03
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This empirical study explores the use of large language models (LLMs) in sentiment analysis and presents a new approach to creating a dataset in Latvian language using Reddit data. Using prompt engineering for the GPT-3.5-turbo model (latest at the time of writing), we achieved 82% accuracy that exceeds previous research on Latvian Tweet Sentiment Corpus by 50% in three class sentiment classification. We also demonstrate that LLMs can partially replace human labelers, making data set creation more cost-effective, especially for larger datasets. This work contributes to sentiment analysis in non-English languages, leveraging the power of LLMs. The paper introduces a new LVReddit dataset that contains more than 90000 samples, making it the largest available sentiment dataset for the Latvian language. Our findings confirm the LLM's underlying "understanding" of language. However, LLMs occasionally deviate from response templates, making parsing challenging. Future research should investigate fine-tuned models based on novel datasets and analyze language patterns.
引用
收藏
页码:165 / 175
页数:11
相关论文
共 22 条
  • [1] Normalization and Automatized Sentiment Analysis of Contemporary Online Latvian Language
    Garkaje, Ginta
    Zilgalve, Evelina
    Dargis, Roberts
    [J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2014, 2014, 268 : 83 - 86
  • [2] Gedin s K., 2013, Automatiska teksta emocionalas noskan,as noteiksana latviesu valoda
  • [3] Gulbinskis I., 2010, Digitalo tekstu sentimenta analize
  • [4] Sentiment Analysis of Lithuanian Texts Using Traditional and Deep Learning Approaches
    Kapociute-Dzikiene, Jurgita
    Damasevicius, Robertas
    Wozniak, Marcin
    [J]. COMPUTERS, 2019, 8 (01)
  • [5] Kojima T, 2022, Arxiv, DOI [arXiv:2205.11916, DOI 10.48550/ARXIV.2205.11916]
  • [6] Maas A. L., 2011, 49 ANN M ASS COMP LI
  • [7] Nicmanis D., 2017, Sabiedribas attieksmes modelesana, izmantojot sentimenta analizi
  • [8] IDENTIFYING POLARITY IN DIFFERENT TEXT TYPES
    Pajupuu, Hille
    Altrov, Rene
    Pajupuu, Jaan
    [J]. FOLKLORE-ELECTRONIC JOURNAL OF FOLKLORE, 2016, (64) : 125 - 142
  • [9] Uses of Machine Translation in the Sentiment Analysis of Tweets
    Peisenieks, Janis
    Skadins, Raivis
    [J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2014, 2014, 268 : 126 - 131
  • [10] Latvian Tweet Corpus and Investigation of Sentiment Analysis for Latvian
    Pinnis, Marcis
    [J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 112 - 119