Sentiment Analysis in Maghrebi Arabic Dialects with Enhanced BERT Models and Big Data Processing

被引:0
作者
Taha, Marbouh [1 ]
Halima, Outada [1 ]
Abdelaziz, Chetouani [1 ]
Omayma, Mahmoudi [2 ]
Naoufal, El Allali [2 ]
机构
[1] Mohammed First Univ, LAMAO Lab, Oujda, Morocco
[2] Mohammed First Univ, MASI Lab, Oujda, Morocco
来源
DIGITAL TECHNOLOGIES AND APPLICATIONS, ICDTA 2024, VOL 4 | 2024年 / 1101卷
关键词
Sentiment analysis; Maghrebi Arabic; BERT; mini; base; Natural Language Processing; Deep Learning; Big Data; Apache Spark;
D O I
10.1007/978-3-031-68675-7_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis in Maghrebi Arabic presents significant linguistic difficulties because of resource constraints. Among these difficulties are the several dialects spoken in the Maghrebi area and the peculiar grammatical systems. We introduce a complete framework that uses the Bidirectional Encoder Representations from Transformers (BERT)-mini and BERT-base models, improved with large data processing capabilities offered by Apache Spark, to improve scalability and performance. Selecting a variety of datasets, sophisticated preprocessing methods like tokenization, normalization, and keeping special characters and emojis are used, and several model architectures are carefully explored. Our method works; BERT-mini reaches accuracy rates of up to 0.885 and BERT-base reaches 0.899. Conventional machine learning techniques are much outperformed by these outcomes. This paper shows how well BERT models may be integrated with big data technology and points up areas that need more investigation. Our framework offers important insights and useful applications in many domains, including social media monitoring, market analysis, and customer feedback evaluation. It does this by tackling the linguistic complexity of Maghrebi Arabic.
引用
收藏
页码:13 / 22
页数:10
相关论文
共 17 条
[1]   A comprehensive survey of arabic sentiment analysis [J].
Al-Ayyoub, Mahmoud ;
Khamaiseh, Abed Allah ;
Jararweh, Yaser ;
Al-Kabi, Mohammed N. .
INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (02) :320-342
[2]   AraSenCorpus: A Semi-Supervised Approach for Sentiment Annotation of a Large Arabic Text Corpus [J].
Al-Laith, Ali ;
Shahbaz, Muhammad ;
Alaskar, Hind F. ;
Rehmat, Asim .
APPLIED SCIENCES-BASEL, 2021, 11 (05)
[3]  
Altaher A, 2017, INT J ADV APPL SCI, V4, P43, DOI 10.21833/ijaas.2017.08.007
[4]  
Chowdhary K., 2020, Fundamentals of Artificial Intelligence, P603, DOI DOI 10.1007/978-81-322-3972-7_19
[5]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[6]  
Harrat S., 2018, J. Int. Sci. Gen. Applic., V1
[7]   Sentiment Analysis of Arabic Tweets using Deep Learning [J].
Heikal, Maha ;
Torki, Marwan ;
El-Makky, Nagwa .
ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 :114-122
[8]  
Hoang M., 2019, P 22 NORD C COMP LIN, P187
[9]  
Omara E, 2018, 2018 PROCEEDINGS OF THE INTERNATIONAL JAPAN-AFRICA CONFERENCE ON ELECTRONICS, COMMUNICATIONS, AND COMPUTATIONS (JAC-ECC 2018), P155, DOI 10.1109/JEC-ECC.2018.8679558
[10]   Deep learning CNN-LSTM framework for Arabic sentiment analysis using textual information shared in social networks [J].
Ombabi, Abubakr H. ;
Ouarda, Wael ;
Alimi, Adel M. .
SOCIAL NETWORK ANALYSIS AND MINING, 2020, 10 (01)