A comprehensive survey on Arabic text augmentation: approaches, challenges, and applications

被引：0

作者：

Ahmed Adel ElSabagh ^{[1
]}

Shahira Shaaban Azab ^{[1
]}

Hesham Ahmed Hefny ^{[1
]}

机构：

[1] Cairo University,Department of Computer Science, Faculty of Graduate Studies for Statistical Research

来源：

Neural Computing and Applications | 2025年 / 37卷 / 10期

关键词：

Text augmentation; Arabic text; Natural language processing; Deep learning;

D O I：

10.1007/s00521-025-11020-z

中图分类号：

学科分类号：

摘要：

Arabic is a linguistically complex language with a rich structure and valuable syntax that pose unique challenges for natural language processing (NLP), primarily due to the scarcity of large, reliable annotated datasets essential for training models. The varieties of dialects and mixtures of more than one language within a single conversation further complicate the development and efficacy of deep learning models targeting Arabic. Data augmentation (DA) techniques have emerged as a promising solution to tackle data scarcity and improve model performance. However, implementing DA in Arabic NLP presents its challenges, particularly in maintaining semantic integrity and adapting to the language’s intricate morphological structure. This survey comprehensively examines various aspects of Arabic data augmentation techniques, covering strategies for model training, methods for evaluating augmentation performance, understanding the effects and applications of augmentation on data, studying NLP downstream tasks, addressing augmentation problems, proposing solutions, conducting in-depth literature reviews, and drawing conclusions. Through detailed analysis of 75 primary and 9 secondary papers, we categorize DA methods into diversity enhancement, resampling, and secondary approaches, each targeting specific challenges inherent in augmenting Arabic datasets. The goal is to offer insights into DA effectiveness, identify research gaps, and suggest future directions for advancing NLP in Arabic.

引用

页码：7015 / 7048

页数：33

共 50 条

[1] A Comprehensive Survey on Arabic Sarcasm Detection: Approaches, Challenges and Future Trends
Rahma, Alaa
Azab, Shahira Shaaban
Mohammed, Ammar
IEEE ACCESS, 2023, 11 : 18261 - 18280
[2] A survey of Arabic text classification approaches
Sayed, Mostafa
Salem, Rashed K.
Khder, Ayman E.
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2019, 59 (03) : 236 - 251
[3] A Survey of Extractive Arabic Text Summarization Approaches
Lagrini, Samira
Redjimi, Mohammed
Aziz, Nabiha
ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 159 - 171
[4] Prescribed performance control approaches, applications and challenges: A comprehensive survey
Bu, Xiangwei
ASIAN JOURNAL OF CONTROL, 2023, 25 (01) : 241 - 261
[5] Data augmentation: A comprehensive survey of modern approaches
Mumuni, Alhassan
Mumuni, Fuseini
ARRAY, 2022, 16
[6] Text Stemming: Approaches, Applications, and Challenges
Singh, Jasmeet
Gupta, Vishal
ACM COMPUTING SURVEYS, 2016, 49 (03)
[7] A comprehensive study for Arabic Sentiment Analysis (Challenges and Applications)
Alsayat, Ahmed
Elmitwally, Nouh
EGYPTIAN INFORMATICS JOURNAL, 2020, 21 (01) : 7 - 12
[8] Arabic text detection: a survey of recent progress challenges and opportunities
Abdullah Y. Muaad
Shaina Raza
Usman Naseem
Hanumanthappa J. Jayappa Davanagere
Applied Intelligence, 2023, 53 : 29845 - 29862
[9] Arabic text detection: a survey of recent progress challenges and opportunities
Muaad, Abdullah Y.
Raza, Shaina
Naseem, Usman
Davanagere, Hanumanthappa J. Jayappa
APPLIED INTELLIGENCE, 2023, 53 (24) : 29845 - 29862
[10] Text Mining Challenges and Applications, A Comprehensive Review
Khan, Muzammil
Khan, Sarwar Shah
Alharbi, Yasser
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2020, 20 (12): : 138 - 148

← 1 2 3 4 5 →