Translation from Tunisian Dialect to Modern Standard Arabic: Exploring Finite-State Transducers and Sequence-to-Sequence Transformer Approaches

被引:0
|
作者
Torjmen, Roua [1 ]
Haddar, Kais [2 ]
机构
[1] Univ Sfax, Fac Econ & Management Sfax, Sfax, Tunisia
[2] Univ Sfax, Fac Sci Sfax, Sfax, Tunisia
关键词
Tunisian dialect; finite-state transducer; sequence-to-sequence tranrformer; machine translation;
D O I
10.1145/3681788
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Translation from the mother tongue, including the Tunisian dialect, to modern standard Arabic is a highly significant field in natural language processing due to its wide range of applications and associated benefits. Recently, researchers have shown increased interest in the Tunisian dialect, primarily driven by the massive volume of content generated spontaneously by Tunisians on social media following the revolution. This article presents two distinct translators for converting the Tunisian dialect into Modern Standard Arabic. The first translator utilizes a rule-based approach, employing a collection of finite state transducers and a bilingual dictionary derived from the study corpus. On the other hand, the second translator relies on deep learning models, specifically the sequence-to-sequence transformer model and a parallel corpus. To assess, evaluate, and compare the performance of the two translators, we conducted tests using a parallel corpus comprising 8,599 words. The results achieved by both translators are noteworthy. The translator based on finite state transducers achieved a BLEU score of 56.65, while the transformer model-based translator achieved a higher score of 66.07.
引用
收藏
页数:19
相关论文
共 4 条
  • [1] Translation system from Tunisian Dialect to Modern Standard Arabic
    Torjmen, Roua
    Haddar, Kais
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (06):
  • [2] Sequence-to-sequence translation from mass spectra to peptides with a transformer model
    Yilmaz, Melih
    Fondrie, William E.
    Bittremieux, Wout
    Melendez, Carlo F.
    Nelson, Rowan
    Ananth, Varun
    Oh, Sewoong
    Noble, William Stafford
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [3] Neural Machine Translation from Jordanian Dialect to Modern Standard Arabic
    Al-Ibrahim, Roqayah
    Duwairi, Rehab M.
    2020 11TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2020, : 173 - 178
  • [4] Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch
    Kchaou, Sameh
    Boujelbane, Rahma
    Hadrich, Lamia
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)