Representation transfer and data cleaning in multi-views for text simplification

被引：1

作者：

He, Wei ^{[1
,2
]}

Farrahi, Katayoun ^{[1
]}

Chen, Bin ^{[3
]}

Peng, Bohua ^{[2
]}

Villavicencio, Aline ^{[2
]}

机构：

[1] Univ Southampton, Dept Elect & Comp Sci, Southampton SO17 1BJ, England

[2] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, England

[3] Univ Sheffield, Dept Automatic Control & Syst Engn, Sheffield S1 3JD, England

来源：

PATTERN RECOGNITION LETTERS | 2024年 / 177卷

基金：

英国工程与自然科学研究理事会;

关键词：

Text simplification; Sentence representation; Pre-trained language model; Data cleaning; Decoding;

D O I：

10.1016/j.patrec.2023.11.011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Representation transfer is a widely used technique in natural language processing. We propose methods of cleaning the dominant dataset of text simplification (TS) WikiLarge in multi-views to remove errors that impact model training and fine-tuning. The results show that our method can effectively refine the dataset. We propose to take the pre-trained text representations from a similar task (e.g., text summarization) to text simplification to conduct a continue-fine-tuning strategy to improve the performance of pre-trained models on TS. This approach will speed up the training and make the model convergence easier. Besides, we also propose a new decoding strategy for simple text generation. It is able to generate simpler and more comprehensible text with controllable lexical simplicity. The experimental results show that our method can achieve good performance on many evaluation metrics.

引用

页码：40 / 46

页数：7