Unsupervised statistical text simplification using pre-trained language modeling for initialization

被引：9

作者：

Qiang, Jipeng ^{[1
]}

Zhang, Feng ^{[1
]}

Li, Yun ^{[1
]}

Yuan, Yunhao ^{[1
]}

Zhu, Yi ^{[1
]}

Wu, Xindong ^{[2
,3
]}

机构：

[1] Yangzhou Univ, Dept Comp Sci, Yangzhou 225127, Jiangsu, Peoples R China

[2] Hefei Univ Technol, Minist Educ, Key Lab Knowledge Engn Big Data, Hefei 23009, Peoples R China

[3] Mininglamp Acad Sci, Mininglamp, Beijing 100089, Peoples R China

来源：

FRONTIERS OF COMPUTER SCIENCE | 2023年 / 17卷 / 01期

基金：

中国国家自然科学基金;

关键词：

text simplification; pre-trained language modeling; BERT; word embeddings;

D O I：

10.1007/s11704-022-1244-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.

引用

页数：10

共 50 条

[21] A Pre-trained Clinical Language Model for Acute Kidney Injury
Mao, Chengsheng
Yao, Liang
Luo, Yuan
2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020), 2020, : 531 - 532
[22] The Impact of Training Methods on the Development of Pre-Trained Language Models
Uribe, Diego
Cuan, Enrique
Urquizo, Elisa
COMPUTACION Y SISTEMAS, 2024, 28 (01): : 109 - 124
[23] Aspect Based Sentiment Analysis by Pre-trained Language Representations
Liang Tianxin
Yang Xiaoping
Zhou Xibo
Wang Bingqian
2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1262 - 1265
[24] SsciBERT: a pre-trained language model for social science texts
Shen, Si
Liu, Jiangfeng
Lin, Litao
Huang, Ying
Zhang, Lin
Liu, Chang
Feng, Yutong
Wang, Dongbo
SCIENTOMETRICS, 2023, 128 (02) : 1241 - 1263
[25] Impact of data quality for automatic issue classification using pre-trained language models
Colavito, Giuseppe
Lanubile, Filippo
Novielli, Nicole
Quaranta, Luigi
JOURNAL OF SYSTEMS AND SOFTWARE, 2024, 210
[26] Identifying Valid User Stories Using BERT Pre-trained Natural Language Models
Scoggin, Sandor Borges
Marques-Neto, Humberto Torres
INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 3, WORLDCIST 2023, 2024, 801 : 167 - 177
[27] Quantifying Gender Bias in Arabic Pre-Trained Language Models
Alrajhi, Wafa
Al-Khalifa, Hend S.
Al-Salman, Abdulmalik S.
IEEE ACCESS, 2024, 12 : 77406 - 77420
[28] Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code
Tagarelli, Andrea
Simeri, Andrea
ARTIFICIAL INTELLIGENCE AND LAW, 2022, 30 (03) : 417 - 473
[29] Unsupervised law article mining based on deep pre-trained language representation models with application to the Italian civil code
Andrea Tagarelli
Andrea Simeri
Artificial Intelligence and Law, 2022, 30 : 417 - 473
[30] Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
Christian, Hans
Suhartono, Derwin
Chowanda, Andry
Zamli, Kamal Z.
JOURNAL OF BIG DATA, 2021, 8 (01)

← 1 2 3 4 5 →