Unsupervised statistical text simplification using pre-trained language modeling for initialization

被引：9

作者：

Qiang, Jipeng ^{[1
]}

Zhang, Feng ^{[1
]}

Li, Yun ^{[1
]}

Yuan, Yunhao ^{[1
]}

Zhu, Yi ^{[1
]}

Wu, Xindong ^{[2
,3
]}

机构：

[1] Yangzhou Univ, Dept Comp Sci, Yangzhou 225127, Jiangsu, Peoples R China

[2] Hefei Univ Technol, Minist Educ, Key Lab Knowledge Engn Big Data, Hefei 23009, Peoples R China

[3] Mininglamp Acad Sci, Mininglamp, Beijing 100089, Peoples R China

来源：

FRONTIERS OF COMPUTER SCIENCE | 2023年 / 17卷 / 01期

基金：

中国国家自然科学基金;

关键词：

text simplification; pre-trained language modeling; BERT; word embeddings;

D O I：

10.1007/s11704-022-1244-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.

引用

页数：10

共 50 条

[41] SiBert: Enhanced Chinese Pre-trained Language Model with Sentence Insertion
Chen, Jiahao
Cao, Chenjie
Jiang, Xiuyan
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2405 - 2412
[42] The impact of using pre-trained word embeddings in Sinhala chatbots
Gamage, Bimsara
Pushpananda, Randil
Weerasinghe, Ruvan
2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 161 - 165
[43] Comprehensive study of pre-trained language models: detecting humor in news headlines
Farah Shatnawi
Malak Abdullah
Mahmoud Hammad
Mahmoud Al-Ayyoub
Soft Computing, 2023, 27 : 2575 - 2599
[44] Addressing Extraction and Generation Separately: Keyphrase Prediction With Pre-Trained Language Models
Liu, Rui
Lin, Zheng
Wang, Weiping
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3180 - 3191
[45] Recent Progress on Named Entity Recognition Based on Pre-trained Language Models
Yang, Binxia
Luo, Xudong
2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 799 - 804
[46] Pre-trained Bert for Natural Language Guided Reinforcement Learning in Atari Game
Li, Xin
Zhang, Yu
Luo, Junren
Liu, Yifeng
2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 5119 - 5124
[47] A Light Bug Triage Framework for Applying Large Pre-trained Language Model
Lee, Jaehyung
Han, Kisun
Yu, Hwanjo
PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
[48] Disambiguating Clinical Abbreviations using Pre-trained Word Embeddings
Jaber, Areej
Martinez, Paloma
HEALTHINF: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL. 5: HEALTHINF, 2021, : 501 - 508
[49] Comprehensive study of pre-trained language models: detecting humor in news headlines
Shatnawi, Farah
Abdullah, Malak
Hammad, Mahmoud
Al-Ayyoub, Mahmoud
SOFT COMPUTING, 2023, 27 (05) : 2575 - 2599
[50] Online Fake News Detection using Pre-trained Embeddings
Reshi, Junaid Ali
Ali, Rashid
2022 5TH INTERNATIONAL CONFERENCE ON MULTIMEDIA, SIGNAL PROCESSING AND COMMUNICATION TECHNOLOGIES (IMPACT), 2022,

← 1 2 3 4 5 →