A semiautomatic annotation approach for sentiment analysis

被引:11
作者
Alahmary, Rahma [1 ,2 ]
Al-Dossari, Hmood [1 ]
机构
[1] King Saud Univ, Informat Syst Dept, POB 145111, Riyadh 4545, Saudi Arabia
[2] Al Imam Mohammad Ibn Saud Islamic Univ, Informat Syst Dept, Riyadh, Saudi Arabia
关键词
Annotation; deep learning; machine learning; Saudi dialect; sentiment analysis; OPINION;
D O I
10.1177/01655515211006594
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis (SA) aims to extract users' opinions automatically from their posts and comments. Almost all prior works have used machine learning algorithms. Recently, SA research has shown promising performance in using the deep learning approach. However, deep learning is greedy and requires large datasets to learn, so it takes more time for data annotation. In this research, we proposed a semiautomatic approach using Naive Bayes (NB) to annotate a new dataset in order to reduce the human effort and time spent on the annotation process. We created a dataset for the purpose of training and testing the classifier by collecting Saudi dialect tweets. The dataset produced from the semiautomatic model was then used to train and test deep learning classifiers to perform Saudi dialect SA. The accuracy achieved by the NB classifier was 83%. The trained semiautomatic model was used to annotate the new dataset before it was fed into the deep learning classifiers. The three deep learning classifiers tested in this research were convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (Bi-LSTM). Support vector machine (SVM) was used as the baseline for comparison. Overall, the performance of the deep learning classifiers exceeded that of SVM. The results showed that CNN reported the highest performance. On one hand, the performance of Bi-LSTM was higher than that of LSTM and SVM, and, on the other hand, the performance of LSTM was higher than that of SVM. The proposed semiautomatic annotation approach is usable and promising to increase speed and save time and effort in the annotation process.
引用
收藏
页码:398 / 410
页数:13
相关论文
共 49 条
[1]  
Abdulla NA, 2013, 2013 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT)
[2]  
Al Shboul B, 2015, 2015 6TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), P206, DOI 10.1109/IACS.2015.7103228
[3]   Hybrid Deep Learning for Sentiment Polarity Determination of Arabic Microblogs [J].
Al-Azani, Sadam ;
El-Alfy, El-Sayed M. .
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT II, 2017, 10635 :491-500
[4]  
Al-Kabi M.N., 2007, Univ. Sharjah J. Pure Applied Sci, V4, P13
[5]  
Al-Kabi MN, 2013, INT CONF INTERNET, P89, DOI 10.1109/ICIST.2013.6747511
[6]  
Alahmary R. M., 2019, 2019 International Conference on Electronics, Information, and Communication (ICEIC), P1
[7]  
Alosaimy A, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P3933
[8]  
Alshutayri A, 2018, 2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING (ICNLSP), P35
[9]  
Aly M., 2013, P 51 ANN M ASS COMP, VVolume 2, P494
[10]  
[Anonymous], 2012, INT J ADV RES COMPUT