Construction and Evaluation of Sentiment Datasets for Low-Resource Languages: The Case of Uzbek

被引:5
作者
Kuriyozov, Elmurod [1 ]
Matlatipov, Sanatbek [2 ]
Alonso, Miguel A. [1 ]
Gomez-Rodriguez, Carlos [1 ]
机构
[1] Univ A Coruna, Fac Informat, Dept Comp, CITIC,Grp LYS, Campus Elvina, La Coruna 15071, Spain
[2] Natl Univ Uzbekistan, Univ St 4, Tashkent 100174, Uzbekistan
来源
HUMAN LANGUAGE TECHNOLOGY: CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, LTC 2019 | 2022年 / 13212卷
关键词
Sentiment analysis; Low-resource languages; Uzbek language;
D O I
10.1007/978-3-031-05328-3_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To our knowledge, the majority of human language processing technologies for low-resource languages don't have well-established linguistic resources for the development of sentiment analysis applications. Therefore, it is in dire need of such tools and resources to overcome the NLP barriers, so that, low-resource languages can deliver more benefits. In this paper, we fill that gap by providing its first annotated corpora for Uzbek language polarity classification. Our methodology considers collecting a medium-size manually annotated dataset and a larger-size dataset automatically translated from existing resources. Then, we use these datasets to train what, to our knowledge, are the first sentiment analysis models on the Uzbek language, using both traditional machine learning techniques and recent deep learning models. Both sets of techniques achieve similar accuracy (the best model on the manually annotated test set is a convolutional neural network with 88.89% accuracy, and on the translated set, a logistic regression with 89.56% accuracy); with the accuracy of the deep learning models being limited by the quality of available pre-trained word embeddings.
引用
收藏
页码:232 / 243
页数:12
相关论文
共 28 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Barnes J, 2017, Arxiv, DOI arXiv:1709.04219
[3]   Comparative Sentiment Analysis on a Set of Movie Reviews Using Deep Learning Approach [J].
Chakraborty, Koyel ;
Bhattacharyya, Siddhartha ;
Bag, Rajib ;
Hassanien, Aboul Ella .
INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 :311-318
[4]  
Chen YQ, 2014, PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P383
[5]  
Chollet F., 2015, Keras
[6]   Sentiment analysis in Turkish at different granularity levels [J].
Dehkharghani, Rahim ;
Yanikoglu, Berrin ;
Saygin, Yucel ;
Oflazer, Kemal .
NATURAL LANGUAGE ENGINEERING, 2017, 23 (04) :535-559
[7]  
Dietrich A, 2018, LANGUAGE PLANNING PO, P145, DOI [10.1007/978- 3- 319-70926-0_6, DOI 10.1007/978-3-319-70926-0_6]
[8]  
Grave E, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P3483
[9]  
Jang H., 2010, 23 INT C COMP LING, P498
[10]   Sentiment Analysis of Turkish Political News [J].
Kaya, Mesut ;
Fidan, Guven ;
Toroslu, Ismail H. .
2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, :174-180