Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch

被引:1
|
作者
Kchaou, Sameh [1 ]
Boujelbane, Rahma [1 ]
Hadrich, Lamia [1 ]
机构
[1] Univ Sfax, Sfax, Tunisia
关键词
Neural Machine Translation; data augmentation; Arabic Tunisian Dialect; Modern Standard Arabic;
D O I
10.1145/3568674
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep Learning is one of the most promising technologies compared to other methods in the context of machine translation. It has been proven to achieve impressive results on large amounts of parallel data for well-endowed languages. Nevertheless, for low-resource languages such as the Arabic Dialects, Deep Learning models failed due to the lack of available parallel corpora. In this article, we present a method to create a parallel corpus to build an effective NMT model able to translate into MSA, Tunisian Dialect texts present in social networks. For this, we propose a set of data augmentation methods aiming to increase the size of the state-of-the-art parallel corpus. By evaluating the impact of this step, we noticed that it has effectively boosted both the size and the quality of the corpus. Then, using the resulted corpus, we compare the effectiveness of CNN, RNN and transformers models to translate Tunisian Dialect into MSA. Experiments show that a better translation is achieved by the transformer model with a BLEU score of 60 vs., respectively, 33.36 and 53.98 with RNN and CNN models.
引用
收藏
页数:21
相关论文
共 14 条
  • [1] Neural Machine Translation from Jordanian Dialect to Modern Standard Arabic
    Al-Ibrahim, Roqayah
    Duwairi, Rehab M.
    2020 11TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2020, : 173 - 178
  • [2] Translation system from Tunisian Dialect to Modern Standard Arabic
    Torjmen, Roua
    Haddar, Kais
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (06):
  • [3] Building a Tunisian Dialect into Arabic Language Parallel Corpus for a Phrase-based Machine Translation
    Sghaier, Mohamed Ali
    Zrigui, Mounir
    VISION 2025: EDUCATION EXCELLENCE AND MANAGEMENT OF INNOVATIONS THROUGH SUSTAINABLE ECONOMIC COMPETITIVE ADVANTAGE, 2019, : 2910 - 2921
  • [4] Translation from Tunisian Dialect to Modern Standard Arabic: Exploring Finite-State Transducers and Sequence-to-Sequence Transformer Approaches
    Torjmen, Roua
    Haddar, Kais
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (10)
  • [5] Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation
    Mohamed Atta Faheem
    Khaled Tawfik Wassif
    Hanaa Bayomi
    Sherif Mahdy Abdou
    Scientific Reports, 14
  • [6] Improving neural machine translation for low resource languages through non-parallel corpora: a case study of Egyptian dialect to modern standard Arabic translation
    Faheem, Mohamed Atta
    Wassif, Khaled Tawfik
    Bayomi, Hanaa
    Abdou, Sherif Mahdy
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [7] A Neural Machine Translation Model for Arabic Dialects That Utilizes Multitask Learning (MTL)
    Baniata, Laith H.
    Park, Seyoung
    Park, Seong-Bae
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2018, 2018
  • [8] A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units
    Baniata, Laith H.
    Ampomah, Isaac K. E.
    Park, Seyoung
    SENSORS, 2021, 21 (19)
  • [9] CRAN: An Hybrid CNN-RNN Attention-Based Model for Arabic Machine Translation
    Bensalah, Nouhaila
    Ayad, Habib
    Adib, Abdellah
    El Farouk, Abdelhamid Ibn
    NETWORKING, INTELLIGENT SYSTEMS AND SECURITY, 2022, 237 : 87 - 102
  • [10] Simplification of Arabic text: A hybrid approach integrating machine translation and transformer-based lexical model
    Al-Thanyyan, Suha S.
    Azmi, Aqil M.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (08)