Simplification of Arabic text: A hybrid approach integrating machine translation and transformer-based lexical model

被引:3
|
作者
Al-Thanyyan, Suha S. [1 ]
Azmi, Aqil M. [1 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, Riyadh 11543, Saudi Arabia
关键词
Text simplification; Arabic text simplification; Lexical simplification; Neural machine translation; Transformers; Arabic corpora; READABILITY;
D O I
10.1016/j.jksuci.2023.101662
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The process of text simplification (TS) is crucial for enhancing the comprehension of written material, especially for people with low literacy levels and those who struggle to understand written content. In this study, we introduce the first automated approach to TS that combines word-level and sentencelevel simplification techniques for Arabic text. We employ three models: a neural machine translation model, an Arabic-BERT-based lexical model, and a hybrid model that combines both methods to simplify the text. To evaluate the models, we created and utilized two Arabic datasets, namely EW-SEW and WikiLarge, comprising 82,585 and 249 sentence pairs, respectively. As resources were scarce, we made these datasets available to other researchers. The EW-SEW dataset is a commonly used English TS corpus that aligns each sentence in the original English Wikipedia (EW) with a simpler reference sentence from Simple English Wikipedia (SEW). In contrast, the WikiLarge dataset has eight simplified reference sentences for each of the 249 test sentences. The hybrid model outperformed the other models, achieving a BLEU score of 55.68, a SARI score of 37.15, and an FBERT score of 86.7% on the WikiLarge dataset, demonstrating the effectiveness of the combined approach.(c) 2023 The Author(s). Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:13
相关论文
共 27 条
  • [1] Multilingual Controllable Transformer-Based Lexical Simplification
    Sheang, Kim Cheng
    Saggion, Horacio
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (71): : 109 - 123
  • [2] A transformer-based approach for Arabic offline handwritten text recognition
    Momeni, Saleh
    Babaali, Bagher
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3053 - 3062
  • [3] A transformer-based approach for Arabic offline handwritten text recognition
    Saleh Momeni
    Bagher BabaAli
    Signal, Image and Video Processing, 2024, 18 : 3053 - 3062
  • [4] Classifier Based Text Simplification for Improved Machine Translation
    Tyagi, Shruti
    Chopra, Deepti
    Mathur, Iti
    Joshi, Nisheeth
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND APPLICATIONS (ICACEA), 2015, : 46 - 50
  • [5] On compositional generalization of transformer-based neural machine translation
    Yin, Yongjing
    Fu, b Lian
    Li, Yafu
    Zhang, Yue
    INFORMATION FUSION, 2024, 111
  • [6] An ensemble transformer-based model for Arabic sentiment analysis
    Mohamed, Omar
    Kassem, Aly M. M.
    Ashraf, Ali
    Jamal, Salma
    Mohamed, Ensaf Hussein
    SOCIAL NETWORK ANALYSIS AND MINING, 2022, 13 (01)
  • [7] An ensemble transformer-based model for Arabic sentiment analysis
    Omar Mohamed
    Aly M. Kassem
    Ali Ashraf
    Salma Jamal
    Ensaf Hussein Mohamed
    Social Network Analysis and Mining, 13
  • [8] Debugging Translations of Transformer-based Neural Machine Translation Systems
    Rikters, Matiss
    Pinnis, Marcis
    BALTIC JOURNAL OF MODERN COMPUTING, 2018, 6 (04): : 403 - 417
  • [9] Character-Level Transformer-Based Neural Machine Translation
    Banar, Nikolay
    Daelemans, Walter
    Kestemont, Mike
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 149 - 156
  • [10] Transformer-Based Amharic-to-English Machine Translation With Character Embedding and Combined Regularization Techniques
    Asefa, Surafiel Habib
    Assabie, Yaregal
    IEEE ACCESS, 2025, 13 : 1090 - 1105