Enhancing Sentence Simplification in Portuguese: Leveraging Paraphrases, Context, and Linguistic Features

被引:0
作者
Scalercio, Arthur [1 ]
Finatto, Maria Jose [2 ]
Paes, Aline [1 ]
机构
[1] Univ Fed Fluminense, Inst Comp, Niteroi, RJ, Brazil
[2] Univ Fed Rio Grande do Sul, Porto Alegre, RS, Brazil
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年
关键词
GENERATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text simplification focuses on transforming texts into a more comprehensible version without sacrificing their precision. However, automatic methods usually require (paired) datasets that can be rather scarce in languages other than English. This paper presents a new approach to automatic sentence simplification that leverages paraphrases, context, and linguistic attributes to overcome the absence of paired texts in Portuguese. We frame the simplification problem as a textual style transfer task and learn a style representation using the sentences around the target sentence in the document and its linguistic attributes. Moreover, unlike most unsupervised approaches that require style-labeled training data, we fine-tune strong pre-trained models using sentence-level paraphrases instead of annotated data. Our experiments show that our model achieves remarkable results, surpassing the current stateof-the-art (BART+ACCESS) while competitively matching a Large Language Model.
引用
收藏
页码:15076 / 15091
页数:16
相关论文
共 53 条
[51]  
Zhang Tianyi, 2020, INT C LEARNING REPR
[52]  
Zhang Xingxing., 2017, P 2017 C EMPIRICAL M, P584, DOI [10.18653/v1/D17-1062, DOI 10.18653/V1/D17-1062]
[53]  
Zhao SQ, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3164