Text Simplification Using Transformer and BERT

被引:4
|
作者
Alissa, Sarah [1 ]
Wald, Mike [2 ]
机构
[1] Imam Abdulrahman Bin Faisal Univ, Coll Comp Sci & Informat Technol, Dammam, Saudi Arabia
[2] Univ Southampton, Sch Elect & Comp Sci, Southampton, England
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 75卷 / 02期
关键词
Text simplification; neural machine translation; transformer;
D O I
10.32604/cmc.2023.033647
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reading and writing are the main interaction methods with web content. Text simplification tools are helpful for people with cognitive impairments, new language learners, and children as they might find difficulties in understanding the complex web content. Text simplification is the process of changing complex text into more readable and understandable text. The recent approaches to text simplification adopted the machine translation concept to learn simplification rules from a parallel corpus of complex and simple sentences. In this paper, we propose two models based on the transformer which is an encoder-decoder structure that achieves state-of-the-art (SOTA) results in machine translation. The training process for our model includes three steps: preprocessing the data using a subword tokenizer, training the model and optimizing it using the Adam optimizer, then using the model to decode the output. The first model uses the transformer only and the second model uses and integrates the Bidirectional Encoder Representations from Transformer (BERT) as encoder to enhance the training time and results. The performance of the proposed model using the transformer was evaluated using the Bilingual Evaluation Understudy score (BLEU) and recorded (53.78) on the WikiSmall dataset. On the other hand, the experiment on the second model which is integrated with BERT shows that the validation loss decreased very fast compared with the model without the BERT. However, the BLEU score was small (44.54), which could be due to the size of the dataset so the model was overfitting and unable to generalize well. Therefore, in the future, the second model could involve experimenting with a larger dataset such as the WikiLarge. In addition, more analysis has been done on the model's results and the used dataset using different evaluation metrics to understand their performance.
引用
收藏
页码:3479 / 3495
页数:17
相关论文
共 50 条
  • [41] NegAIT: A new parser for medical text simplification using morphological, sentential and double negation
    Mukherjee, Partha
    Leroy, Gondy
    Kauchak, David
    Rajanarayanan, Srinidhi
    Diaz, Damian Y. Romero
    Yuan, Nicole P.
    Pritchard, T. Gail
    Colina, Sonia
    JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 69 : 55 - 62
  • [42] Unsupervised statistical text simplification using pre-trained language modeling for initialization
    Qiang, Jipeng
    Zhang, Feng
    Li, Yun
    Yuan, Yunhao
    Zhu, Yi
    Wu, Xindong
    FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (01)
  • [43] Boost Transformer with BERT and copying mechanism for ASR error correction
    Li, Wenkun
    Di, Hui
    Wang, Lina
    Ouchi, Kazushige
    Lu, Jing
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [44] BERT-LSTM network prediction model based on Transformer
    Guo, Jiachen
    Liu, Jun
    Yang, Chenxi
    Dong, Jianguo
    Wang, Zhengyi
    Dong Shijian
    PROCEEDINGS OF THE 36TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC 2024, 2024, : 3098 - 3103
  • [45] Transformer for Handwritten Text Recognition Using Bidirectional Post-decoding
    Wick, Christoph
    Zoellner, Jochen
    Gruening, Tobias
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III, 2021, 12823 : 112 - 126
  • [46] Which bug reports are valid and why? Using the BERT transformer to classify bug reports and explain their validity
    Meng, Qianru
    Joost, Visser
    PROCEEDINGS OF THE 4TH EUROPEAN SYMPOSIUM ON SOFTWARE ENGINEERING, ESSE 2023, 2024, : 52 - 60
  • [47] CATS: A Tool for Customized Alignment of Text Simplification Corpora
    Stajner, Sanja
    Franco-Salvador, Marc
    Rosso, Paolo
    Ponzetto, Simone Paolo
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3895 - 3903
  • [48] Medical Text Simplification by Medical Trainees: A Feasibility Study
    Choi, Yong K.
    Kirchhoff, Katrin
    Turner, Anne M.
    2016 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2016, : 334 - 340
  • [49] Text simplification and comprehensible input: A case for an intuitive approach
    Crossley, Scott A.
    Allen, David
    McNamara, Danielle S.
    LANGUAGE TEACHING RESEARCH, 2012, 16 (01) : 89 - 108
  • [50] Clear, easy, plain, and simple as keywords for text simplification
    Vecchiato, Sara
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5