Enhancing Sentence Simplification in Portuguese: Leveraging Paraphrases, Context, and Linguistic Features

被引:0
作者
Scalercio, Arthur [1 ]
Finatto, Maria Jose [2 ]
Paes, Aline [1 ]
机构
[1] Univ Fed Fluminense, Inst Comp, Niteroi, RJ, Brazil
[2] Univ Fed Rio Grande do Sul, Porto Alegre, RS, Brazil
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年
关键词
GENERATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text simplification focuses on transforming texts into a more comprehensible version without sacrificing their precision. However, automatic methods usually require (paired) datasets that can be rather scarce in languages other than English. This paper presents a new approach to automatic sentence simplification that leverages paraphrases, context, and linguistic attributes to overcome the absence of paired texts in Portuguese. We frame the simplification problem as a textual style transfer task and learn a style representation using the sentences around the target sentence in the document and its linguistic attributes. Moreover, unlike most unsupervised approaches that require style-labeled training data, we fine-tune strong pre-trained models using sentence-level paraphrases instead of annotated data. Our experiments show that our model achieves remarkable results, surpassing the current stateof-the-art (BART+ACCESS) while competitively matching a Large Language Model.
引用
收藏
页码:15076 / 15091
页数:16
相关论文
共 53 条
[1]  
Agrawal S, 2021, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, P3757
[2]  
Agrawal Sweta, 2023, C EMPIRICAL METHOD
[3]   Automated Text Simplification: A Survey [J].
Al-Thanyyan, Suha S. ;
Azmi, Aqil M. .
ACM COMPUTING SURVEYS, 2021, 54 (02)
[4]  
Aluisio S.M., 2010, P NAACL HLT 2010 YOU, P46
[5]  
Alva-Manchego F, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P49
[6]  
Alva-Manchego F, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4668
[7]  
Alva-Manchego Fernando., 2017, P 8 INT JOINT C NAT, P295
[8]   Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond [J].
Artetxe, Mikel ;
Schwenk, Holger .
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2019, 7 :597-610
[9]  
Cao YX, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P1061
[10]  
Carmo D, 2020, Arxiv, DOI [arXiv:2008.09144, DOI 10.48550/ARXIV:2008.09144]