A Sesotho news headlines dataset for sentiment analysis

被引:3
作者
Mokhosi, Refuoe [1 ]
Shivachi, Casper-Shikali [2 ]
Sethobane, Matello [1 ]
机构
[1] Botho Univ, Fac Engn & Technol, West Wing, Thetsane, Lesotho
[2] South Eastern Kenya Univ, Comp Sci & Technol Dept, POB 170-90200, Kitui, Kenya
关键词
Sesotho dataset; News headlines; Sentiment analysis; Aspect based sentiment analysis; Natural language processing; Machine learning;
D O I
10.1016/j.dib.2024.110371
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Sentiment Analysis (SA) is a subset of Natural Language Processing (NLP) which has become a promising research area enabling the provision of language specific services. Although research in high resource languages such as English and Chinese has achieved promising results, research in low resource African languages such as Sesotho is still in its infancy due to limited text and speech datasets. This study contributes in this regard by availing the Sesotho News (SN) dataset, as an annotated dataset for the SA and Aspect Based Sentiment Analysis (ABSA) tasks. This dataset may be used for NLP research to benefit 1.85 million Sesotho speakers in Lesotho and 11.5 million speakers in South Africa. The dataset includes 4651 headlines for the ABSA task and 2401 headlines for the SA task using Lesotho's orthography of Sesotho. The news headlines were collected from Sesotho online newspapers and then annotated for the ABSA and SA tasks. The Spearman's correlation and Cohen's Kappa Index metrics show that there is good correlation between the annotators, implying that the SN dataset is of gold standard. (c) 2024 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC license ( http://creativecommons.org/licenses/by-nc/4.0/ )
引用
收藏
页数:8
相关论文
共 8 条
[1]  
Abdul-Mageed M, 2012, LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3907
[2]  
Abdullahi S.S., 2023, 4 WORKSH AFR LANG PR
[3]  
Boland K., 2013, Creating an Annotated Corpus for Sentiment Analysis of German Product Reviews
[4]   ?The people divided by a common language?: The orthography of Sesotho in Lesotho, South Africa, and the implications for Bible translation [J].
Makutoane, Tshokolo J. .
HTS TEOLOGIESE STUDIES-THEOLOGICAL STUDIES, 2022, 78 (01)
[5]  
Mokhosi Refuoe, 2024, Zenodo, DOI 10.5281/ZENODO.10531959
[6]  
Muhammad SH, 2023, Arxiv, DOI [arXiv:2302.08956, DOI 10.48550/ARXIV.2302.08956]
[7]  
Tseng T, 2020, Arxiv, DOI arXiv:2009.11654
[8]  
Zerbian S, 2017, PHONOL PHONET, V24, P393, DOI 10.1515/9783110503524-012