A comprehensive survey on Arabic text augmentation: approaches, challenges, and applications

被引:0
作者
Ahmed Adel ElSabagh [1 ]
Shahira Shaaban Azab [1 ]
Hesham Ahmed Hefny [1 ]
机构
[1] Department of Computer Science, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza
关键词
Arabic text; Deep learning; Natural language processing; Text augmentation;
D O I
10.1007/s00521-025-11020-z
中图分类号
学科分类号
摘要
Arabic is a linguistically complex language with a rich structure and valuable syntax that pose unique challenges for natural language processing (NLP), primarily due to the scarcity of large, reliable annotated datasets essential for training models. The varieties of dialects and mixtures of more than one language within a single conversation further complicate the development and efficacy of deep learning models targeting Arabic. Data augmentation (DA) techniques have emerged as a promising solution to tackle data scarcity and improve model performance. However, implementing DA in Arabic NLP presents its challenges, particularly in maintaining semantic integrity and adapting to the language’s intricate morphological structure. This survey comprehensively examines various aspects of Arabic data augmentation techniques, covering strategies for model training, methods for evaluating augmentation performance, understanding the effects and applications of augmentation on data, studying NLP downstream tasks, addressing augmentation problems, proposing solutions, conducting in-depth literature reviews, and drawing conclusions. Through detailed analysis of 75 primary and 9 secondary papers, we categorize DA methods into diversity enhancement, resampling, and secondary approaches, each targeting specific challenges inherent in augmenting Arabic datasets. The goal is to offer insights into DA effectiveness, identify research gaps, and suggest future directions for advancing NLP in Arabic. © The Author(s) 2025.
引用
收藏
页码:7015 / 7048
页数:33
相关论文
共 50 条
  • [21] A Survey of Text Summarization Approaches Based on Deep Learning
    Hou, Sheng-Luan
    Huang, Xi-Kun
    Fei, Chao-Qun
    Zhang, Shu-Han
    Li, Yang-Yang
    Sun, Qi-Lin
    Wang, Chuan-Qing
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2021, 36 (03) : 633 - 663
  • [22] Stemming Impact on Arabic Text Categorization Performance: a Survey
    Al-Anzi, Fawaz S.
    AbuZeina, Dia
    2015 5TH INTERNATIONAL CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGY AND ACCESSIBILITY (ICTA), 2015,
  • [23] A survey of text summarization and Headline Generation methods in Arabic
    Shaibani, Arwa
    Elnagar, Ashraf
    PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2024, 2024, : 317 - 323
  • [24] A Comprehensive Survey on Deep Facial Expression Recognition: Challenges, Applications, and Future Guidelines
    Sajjad, Muhammad
    Ullah, Fath U. Min
    Ullah, Mohib
    Christodoulou, Georgia
    Cheikh, Faouzi Alaya
    Hijji, Mohammad
    Muhammad, Khan
    Rodrigues, Joel J. P. C.
    ALEXANDRIA ENGINEERING JOURNAL, 2023, 68 : 817 - 840
  • [25] Data Augmentation Using Transformers and Similarity Measures for Improving Arabic Text Classification
    Refai, Dania
    Abu-Soud, Saleh
    Abdel-Rahman, Mohammad J.
    IEEE ACCESS, 2023, 11 : 132516 - 132531
  • [26] A comprehensive survey on radio frequency (RF) fingerprinting: Traditional approaches, deep learning, and open challenges
    Jagannath, Anu
    Jagannath, Jithin
    Kumar, Prem Sagar Pattanshetty Vasanth
    COMPUTER NETWORKS, 2022, 219
  • [27] A Comprehensive Survey of Image Augmentation Techniques for Deep Learning
    Xu, Mingle
    Yoon, Sook
    Fuentes, Alvaro
    Park, Dong Sun
    PATTERN RECOGNITION, 2023, 137
  • [28] Investigating Hybrid Approaches for Arabic Text Diacritization with Recurrent Neural Networks
    Alqudah, Saba'
    Abandah, Gheith
    Arabiyat, Alaa
    2017 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2017,
  • [29] Data augmentation approaches in natural language processing: A survey
    Li, Bohan
    Hou, Yutai
    Che, Wanxiang
    AI OPEN, 2022, 3 : 71 - 90
  • [30] A comprehensive survey of techniques for developing an Arabic question answering system
    Alkhurayyif, Yazeed
    Sait, Abdul Rahaman Wahab
    PEERJ COMPUTER SCIENCE, 2023, 9