Simplification of Arabic text: A hybrid approach integrating machine translation and transformer-based lexical model

被引:3
|
作者
Al-Thanyyan, Suha S. [1 ]
Azmi, Aqil M. [1 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11543, Saudi Arabia
关键词
Text simplification; Arabic text simplification; Lexical simplification; Neural machine translation; Transformers; Arabic corpora; READABILITY;
D O I
10.1016/j.jksuci.2023.101662
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The process of text simplification (TS) is crucial for enhancing the comprehension of written material, especially for people with low literacy levels and those who struggle to understand written content. In this study, we introduce the first automated approach to TS that combines word-level and sentencelevel simplification techniques for Arabic text. We employ three models: a neural machine translation model, an Arabic-BERT-based lexical model, and a hybrid model that combines both methods to simplify the text. To evaluate the models, we created and utilized two Arabic datasets, namely EW-SEW and WikiLarge, comprising 82,585 and 249 sentence pairs, respectively. As resources were scarce, we made these datasets available to other researchers. The EW-SEW dataset is a commonly used English TS corpus that aligns each sentence in the original English Wikipedia (EW) with a simpler reference sentence from Simple English Wikipedia (SEW). In contrast, the WikiLarge dataset has eight simplified reference sentences for each of the 249 test sentences. The hybrid model outperformed the other models, achieving a BLEU score of 55.68, a SARI score of 37.15, and an FBERT score of 86.7% on the WikiLarge dataset, demonstrating the effectiveness of the combined approach.(c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:13
相关论文
共 27 条
  • [21] Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch
    Kchaou, Sameh
    Boujelbane, Rahma
    Hadrich, Lamia
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
  • [22] TransSurv: Transformer-Based Survival Analysis Model Integrating Histopathological Images and Genomic Data for Colorectal Cancer
    Lv, Zhilong
    Lin, Yuexiao
    Yan, Rui
    Wang, Ying
    Zhang, Fa
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (06) : 3411 - 3420
  • [23] A transformer-Based neural language model that synthesizes brain activation maps from free-form text queries
    Ngo, Gia H.
    Nguyen, Minh
    Chen, Nancy F.
    Sabuncu, Mert R.
    MEDICAL IMAGE ANALYSIS, 2022, 81
  • [24] Integrating Non-Fourier and AST-Structural Relative Position Representations Into Transformer-Based Model for Source Code Summarization
    Liang, Hsiang-Mei
    Huang, Chin-Yu
    IEEE ACCESS, 2024, 12 : 9871 - 9889
  • [25] Transformer-based model for predicting trajectories in autonomous vehicle-pedestrian conflicts: a proactive approach to road safety
    Shoman, Maged
    Sayed, Tarek
    Gargoum, Suliman
    CANADIAN JOURNAL OF CIVIL ENGINEERING, 2025,
  • [26] Evaluation of The Arabic Text Reading Skills Model Based On The Integrated Dini Curriculum Inquiry Activity: A Fuzzy Delphi Approach
    Nasirudin, Zulkifli Din Mohamed
    Baharudin, Harun
    Yusoff, Nik Mohd Rahimi Nik
    Yusof, Nabihah
    IJAZ ARABI JOURNAL OF ARABIC LEARNING, 2022, 5 (02): : 373 - 384
  • [27] NmTHC: a hybrid error correction method based on a generative neural machine translation model with transfer learning
    Wang, Rongshu
    Chen, Jianhua
    BMC GENOMICS, 2024, 25 (01):