Multilingual Controllable Transformer-Based Lexical Simplification

被引:0
|
作者
Sheang, Kim Cheng [1 ]
Saggion, Horacio [1 ]
机构
[1] Univ Pompeu Fabra, LaSTUS Grp, TALN Lab, DTIC, Barcelona, Spain
来源
PROCESAMIENTO DEL LENGUAJE NATURAL | 2023年 / 71期
关键词
Multilingual Lexical Simplification; Controllable Lexical Simplification; Text Simplification; Multilinguality;
D O I
10.26342/2023-71-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text is by far the most ubiquitous source of knowledge and information and should be made easily accessible to as many people as possible; however, texts often contain complex words that hinder reading comprehension and accessibility. Therefore, suggesting simpler alternatives for complex words without compromising meaning would help convey the information to a broader audience. This paper proposes mTLS, a multilingual controllable Transformer-based Lexical Simplification (LS) system fined-tuned with the T5 model. The novelty of this work lies in the use of language-specific prefixes, control tokens, and candidates extracted from pretrained masked language models to learn simpler alternatives for complex words. The evaluation results on three well-known LS datasets - LexMTurk, BenchLS, and NNSEval - show that our model outperforms the previous state-of-the-art models like LSBert and ConLS. Moreover, further evaluation of our approach on the part of the recent TSAR-2022 multilingual LS shared-task dataset shows that our model performs competitively when compared with the participating systems for English LS and even outperforms the GPT-3 model on several metrics. Moreover, our model obtains performance gains also for Spanish and Portuguese.
引用
收藏
页码:109 / 123
页数:15
相关论文
共 28 条
  • [21] A multilingual FrameNet-based grammar and lexicon for controlled natural language
    Gruzitis, Normunds
    Dannells, Dana
    LANGUAGE RESOURCES AND EVALUATION, 2017, 51 (01) : 37 - 66
  • [22] Text Simplification with Self-Attention-Based Pointer-Generator Networks
    Li, Tianyu
    Li, Yun
    Qiang, Jipeng
    Yuan, Yun-Hao
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 537 - 545
  • [23] Towards a conceptual core for multicultural processing:: A multilingual ontology based on the Swadesh list
    Huang, Chu-Ren
    Prevot, Laurent
    Su, I-Li
    Hong, Jia-Fei
    INTERCULTURAL COLLABORATION, 2007, 4568 : 17 - +
  • [24] Controlled Natural Language Generation from a Multilingual FrameNet-Based Grammar
    Dannells, Dana
    Gruzitis, Normunds
    CONTROLLED NATURAL LANGUAGE, CNL 2014, 2014, 8625 : 155 - 166
  • [25] Comparison of Classifier Based Approach with Baseline Approach for English-Hindi Text Simplification
    Tyagi, Shruti
    Chopra, Deepti
    Mathur, Iti
    Joshi, Nisheeth
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION & AUTOMATION (ICCCA), 2015, : 290 - 293
  • [26] Ensembling and Score-Based Filtering in Sentence Alignment for Automatic Simplification of German Texts
    Spring, Nicolas
    Kostrzewa, Marek
    Rios, Annette
    Ebling, Sarah
    UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: NOVEL DESIGN APPROACHES AND TECHNOLOGIES, UAHCI 2022, PT I, 2022, 13308 : 137 - 149
  • [27] Informational pamphlets for asylum seekers in English A proposal for simplification in translation based on the Plain Language Movement
    Toledo Baez, Cristina
    Alexandra Conrad, Claire
    REVISTA ESPANOLA DE LINGUISTICA APLICADA, 2017, 30 (02): : 559 - 591
  • [28] Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian
    Brunato, Dominique
    Dell'Orletta, Felice
    Venturi, Giulia
    FRONTIERS IN PSYCHOLOGY, 2022, 13