Multilingual Controllable Transformer-Based Lexical Simplification

被引:0
|
作者
Sheang, Kim Cheng [1 ]
Saggion, Horacio [1 ]
机构
[1] Univ Pompeu Fabra, LaSTUS Grp, TALN Lab, DTIC, Barcelona, Spain
来源
PROCESAMIENTO DEL LENGUAJE NATURAL | 2023年 / 71期
关键词
Multilingual Lexical Simplification; Controllable Lexical Simplification; Text Simplification; Multilinguality;
D O I
10.26342/2023-71-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text is by far the most ubiquitous source of knowledge and information and should be made easily accessible to as many people as possible; however, texts often contain complex words that hinder reading comprehension and accessibility. Therefore, suggesting simpler alternatives for complex words without compromising meaning would help convey the information to a broader audience. This paper proposes mTLS, a multilingual controllable Transformer-based Lexical Simplification (LS) system fined-tuned with the T5 model. The novelty of this work lies in the use of language-specific prefixes, control tokens, and candidates extracted from pretrained masked language models to learn simpler alternatives for complex words. The evaluation results on three well-known LS datasets - LexMTurk, BenchLS, and NNSEval - show that our model outperforms the previous state-of-the-art models like LSBert and ConLS. Moreover, further evaluation of our approach on the part of the recent TSAR-2022 multilingual LS shared-task dataset shows that our model performs competitively when compared with the participating systems for English LS and even outperforms the GPT-3 model on several metrics. Moreover, our model obtains performance gains also for Spanish and Portuguese.
引用
收藏
页码:109 / 123
页数:15
相关论文
共 28 条
  • [1] Simplification of Arabic text: A hybrid approach integrating machine translation and transformer-based lexical model
    Al-Thanyyan, Suha S.
    Azmi, Aqil M.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (08)
  • [2] Controllable Sentence Simplification
    Martin, Louis
    de la Clergerie, Eric Villemonte
    Sagot, Benoit
    Bordes, Antoine
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4689 - 4698
  • [3] Benchmarking Lexical Simplification Systems
    Paetzold, Gustavo H.
    Specia, Lucia
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3074 - 3080
  • [4] SimpLex: a lexical text simplification architecture
    Truica, Ciprian-Octavian
    Stan, Andrei-Ionut
    Apostol, Elena-Simona
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (08) : 6265 - 6280
  • [5] ALEXSIS: A Dataset for Lexical Simplification in Spanish
    Ferres, Daniel
    Saggion, Horacio
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3582 - 3594
  • [6] SimpLex: a lexical text simplification architecture
    Ciprian-Octavian Truică
    Andrei-Ionuţ Stan
    Elena-Simona Apostol
    Neural Computing and Applications, 2023, 35 : 6265 - 6280
  • [7] Text Simplification Using Transformer and BERT
    Alissa, Sarah
    Wald, Mike
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 3479 - 3495
  • [8] Integration of lexical and syntactic simplification capabilities in a text editor
    Hervas, Raquel
    Bautista, Susana
    Rodriguez, Marta
    de Salas, Teresa
    Vargas, Ana
    Gervas, Pablo
    5TH INTERNATIONAL CONFERENCE ON SOFTWARE DEVELOPMENT AND TECHNOLOGIES FOR ENHANCING ACCESSIBILITY AND FIGHTING INFO-EXCLUSION, DSAI 2013, 2014, 27 : 94 - 103
  • [9] Out in the Open: Finding and Categorising Errors in the Lexical Simplification Pipeline
    Shardlow, Matthew
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1583 - 1590
  • [10] EASIER corpus: A lexical simplification resource for people with cognitive impairments
    Alarcon, Rodrigo
    Moreno, Lourdes
    Martinez, Paloma
    PLOS ONE, 2023, 18 (04):