A comprehensive survey on Arabic text augmentation: approaches, challenges, and applications

被引:0
|
作者
Ahmed Adel ElSabagh [1 ]
Shahira Shaaban Azab [1 ]
Hesham Ahmed Hefny [1 ]
机构
[1] Cairo University,Department of Computer Science, Faculty of Graduate Studies for Statistical Research
关键词
Text augmentation; Arabic text; Natural language processing; Deep learning;
D O I
10.1007/s00521-025-11020-z
中图分类号
学科分类号
摘要
Arabic is a linguistically complex language with a rich structure and valuable syntax that pose unique challenges for natural language processing (NLP), primarily due to the scarcity of large, reliable annotated datasets essential for training models. The varieties of dialects and mixtures of more than one language within a single conversation further complicate the development and efficacy of deep learning models targeting Arabic. Data augmentation (DA) techniques have emerged as a promising solution to tackle data scarcity and improve model performance. However, implementing DA in Arabic NLP presents its challenges, particularly in maintaining semantic integrity and adapting to the language’s intricate morphological structure. This survey comprehensively examines various aspects of Arabic data augmentation techniques, covering strategies for model training, methods for evaluating augmentation performance, understanding the effects and applications of augmentation on data, studying NLP downstream tasks, addressing augmentation problems, proposing solutions, conducting in-depth literature reviews, and drawing conclusions. Through detailed analysis of 75 primary and 9 secondary papers, we categorize DA methods into diversity enhancement, resampling, and secondary approaches, each targeting specific challenges inherent in augmenting Arabic datasets. The goal is to offer insights into DA effectiveness, identify research gaps, and suggest future directions for advancing NLP in Arabic.
引用
收藏
页码:7015 / 7048
页数:33
相关论文
共 50 条
  • [1] A Comprehensive Survey on Arabic Sarcasm Detection: Approaches, Challenges and Future Trends
    Rahma, Alaa
    Azab, Shahira Shaaban
    Mohammed, Ammar
    IEEE ACCESS, 2023, 11 : 18261 - 18280
  • [2] A survey of Arabic text classification approaches
    Sayed, Mostafa
    Salem, Rashed K.
    Khder, Ayman E.
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2019, 59 (03) : 236 - 251
  • [3] A Survey of Extractive Arabic Text Summarization Approaches
    Lagrini, Samira
    Redjimi, Mohammed
    Aziz, Nabiha
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 159 - 171
  • [4] Prescribed performance control approaches, applications and challenges: A comprehensive survey
    Bu, Xiangwei
    ASIAN JOURNAL OF CONTROL, 2023, 25 (01) : 241 - 261
  • [5] Data augmentation: A comprehensive survey of modern approaches
    Mumuni, Alhassan
    Mumuni, Fuseini
    ARRAY, 2022, 16
  • [6] Text Stemming: Approaches, Applications, and Challenges
    Singh, Jasmeet
    Gupta, Vishal
    ACM COMPUTING SURVEYS, 2016, 49 (03)
  • [7] A comprehensive study for Arabic Sentiment Analysis (Challenges and Applications)
    Alsayat, Ahmed
    Elmitwally, Nouh
    EGYPTIAN INFORMATICS JOURNAL, 2020, 21 (01) : 7 - 12
  • [8] Arabic text detection: a survey of recent progress challenges and opportunities
    Abdullah Y. Muaad
    Shaina Raza
    Usman Naseem
    Hanumanthappa J. Jayappa Davanagere
    Applied Intelligence, 2023, 53 : 29845 - 29862
  • [9] Arabic text detection: a survey of recent progress challenges and opportunities
    Muaad, Abdullah Y.
    Raza, Shaina
    Naseem, Usman
    Davanagere, Hanumanthappa J. Jayappa
    APPLIED INTELLIGENCE, 2023, 53 (24) : 29845 - 29862
  • [10] Text Mining Challenges and Applications, A Comprehensive Review
    Khan, Muzammil
    Khan, Sarwar Shah
    Alharbi, Yasser
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2020, 20 (12): : 138 - 148