A comprehensive survey on Arabic text augmentation: approaches, challenges, and applications

被引:0
|
作者
Ahmed Adel ElSabagh [1 ]
Shahira Shaaban Azab [1 ]
Hesham Ahmed Hefny [1 ]
机构
[1] Cairo University,Department of Computer Science, Faculty of Graduate Studies for Statistical Research
关键词
Text augmentation; Arabic text; Natural language processing; Deep learning;
D O I
10.1007/s00521-025-11020-z
中图分类号
学科分类号
摘要
Arabic is a linguistically complex language with a rich structure and valuable syntax that pose unique challenges for natural language processing (NLP), primarily due to the scarcity of large, reliable annotated datasets essential for training models. The varieties of dialects and mixtures of more than one language within a single conversation further complicate the development and efficacy of deep learning models targeting Arabic. Data augmentation (DA) techniques have emerged as a promising solution to tackle data scarcity and improve model performance. However, implementing DA in Arabic NLP presents its challenges, particularly in maintaining semantic integrity and adapting to the language’s intricate morphological structure. This survey comprehensively examines various aspects of Arabic data augmentation techniques, covering strategies for model training, methods for evaluating augmentation performance, understanding the effects and applications of augmentation on data, studying NLP downstream tasks, addressing augmentation problems, proposing solutions, conducting in-depth literature reviews, and drawing conclusions. Through detailed analysis of 75 primary and 9 secondary papers, we categorize DA methods into diversity enhancement, resampling, and secondary approaches, each targeting specific challenges inherent in augmenting Arabic datasets. The goal is to offer insights into DA effectiveness, identify research gaps, and suggest future directions for advancing NLP in Arabic.
引用
收藏
页码:7015 / 7048
页数:33
相关论文
共 50 条
  • [31] A comprehensive survey of federated transfer learning: challenges, methods and applications
    GUO Wei
    ZHUANG Fuzhen
    ZHAN Xiao
    TONG Yiqi
    DONG Jin
    Frontiers of Computer Science, 2024, 18 (06)
  • [32] A Comprehensive Survey on Effective Feature Selection Approaches for Text Sentiment Classification Process
    Rajpoot, Abha Kiran
    Nand, Parma
    Abidi, Ali Imam
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 971 - 977
  • [33] Controllable image synthesis methods, applications and challenges: a comprehensive survey
    Huang, Shanshan
    Li, Qingsong
    Liao, Jun
    Wang, Shu
    Liu, Li
    Li, Lian
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (12)
  • [34] Current Approaches in Arabic IR: A Survey
    Mustafa, Mohammed
    AbdAlla, Hisham
    Suleman, Hussein
    DIGITAL LIBRARIES: UNIVERSAL AND UBIQUITOUS ACCESS TO INFORMATION, PROCEEDINGS, 2008, 5362 : 406 - 407
  • [35] Employability prediction: a survey of current approaches, research challenges and applications
    Nesrine Mezhoudi
    Rawan Alghamdi
    Rim Aljunaid
    Gomathi Krichna
    Dilek Düştegör
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 1489 - 1505
  • [36] Employability prediction: a survey of current approaches, research challenges and applications
    Mezhoudi, Nesrine
    Alghamdi, Rawan
    Aljunaid, Rim
    Krichna, Gomathi
    Dustegor, Dilek
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 14 (3) : 1489 - 1505
  • [37] Offline Arabic Handwritten Text Recognition: A Survey
    Parvez, Mohammad Tanvir
    Mahmoud, Sabri A.
    ACM COMPUTING SURVEYS, 2013, 45 (02)
  • [38] SURVEY AND BIBLIOGRAPHY OF ARABIC OPTICAL TEXT RECOGNITION
    ALBADR, B
    MAHMOUD, SA
    SIGNAL PROCESSING, 1995, 41 (01) : 49 - 77
  • [39] Automatic arabic text summarization (AATS): A survey
    Elmenshawy, Maha A.
    Hamza, Taher
    El-Deeb, Reem
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (05) : 6077 - 6092
  • [40] A comprehensive survey for generative data augmentation
    Chen, Yunhao
    Yan, Zihui
    Zhu, Yunjie
    NEUROCOMPUTING, 2024, 600