ArabicDialects: An Efficient Framework for Arabic Dialects Opinion Mining on Twitter Using Optimized Deep Neural Networks

被引：18

作者：

Abdelminaam, Diaa Salama ^{[1
,2
]}

Neggaz, Nabil ^{[3
]}

Gomaa, Ibrahim Abd Elatif ^{[4
,5
]}

Ismail, Fatma Helmy ^{[2
]}

Elsawy, Ahmed A. ^{[5
]}

机构：

[1] Benha Univ, Fac Comp & Artificial Intelligence, Informat Syst Dept, Banha 13511, Egypt

[2] Misr Int Univ, Fac Comp Sci, Cairo 11311, Egypt

[3] Univ Sci Technol Oran Mohamed Boudiaf USTO MB, Fac Math & Informat, Dept Informat, Lab Signal Image Parole SIMPA, Oran 31000, Algeria

[4] Al Obour High Inst Management & Informat, Copm Sci Dept, Cairo 11235, Egypt

[5] Benha Univ, Fac Comp & Artificial Intelligence, Comp Sci Dept, Banha 12311, Egypt

来源：

IEEE ACCESS | 2021年 / 9卷

关键词：

Arabic opinion mining (AOM); Arabic dialects; modern standard Arabic (MSA); deep learning; machine learning; SENTIMENT ANALYSIS;

D O I：

10.1109/ACCESS.2021.3094173

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The rapid development of tools for communication such as social networks, tweeting and Whatsapp has generated a large mass of important textual data. Also, the COVID-19 pandemic has inflamed social networks, hence the automatic analysis of opinions has become paramount. The purpose of this paper is to analyze Arabic tweets in terms of positivity, negativity, or neutrality.In analyzing the opinions of the Arabic language, a real challenge is encountered, which lies in the use of different dialects (Egyptian, Saudian, Maghrebian, Gulfian, Levantine, Syrian $\ldots $ ). In this paper, we introduce two major components: The first employs six machine learning (ML) methods, including Decision Trees (DT), Logistic Regression (LR), k Nearest Neighbors (K-NN), Random Forests (RF), Support Vector Machines (SVM), and Nave Bayes (NB), with the TF-IDF method acting as the feature extraction.While, the second part consists of testing three variants of Deep Learning (DL) based on multiplicative Long Short Term Memory (mLSTM), Long Short Term Memory (LSTM), and Gated Recurrent Unit (GRU) by applying word embedding as the input vector. The experimental study was validated using three Arabic language corpora (TEAD, ATSAD, and ASTD) and two learning modes (Hold out and 10-folds cross validation). The obtained results in terms of Accuracy (ACC), Precesion (PREC), Recall (REC), and F1-score (F1) show a clear performance for DL techniques based on a 10-folds strategy compared to the state-of-the-art. The experiments shown in the paper reveal that the proposed DL models accomplished the best results.

引用

页码：97079 / 97099

页数：21

共 75 条

[1]

Abdellaoui H, 2018, COMPUT SIST, V22, P777, DOI [10.13053/CyS-22-3-3031, 10.13053/cys-22-3-3031]

[2] CoAID-DEEP: An Optimized Intelligent Framework for Automated Detecting COVID-19 Misleading Information on Twitter [J].