Unsupervised statistical text simplification using pre-trained language modeling for initialization

被引：9

作者：

Qiang, Jipeng ^{[1
]}

Zhang, Feng ^{[1
]}

Li, Yun ^{[1
]}

Yuan, Yunhao ^{[1
]}

Zhu, Yi ^{[1
]}

Wu, Xindong ^{[2
,3
]}

机构：

[1] Yangzhou Univ, Dept Comp Sci, Yangzhou 225127, Jiangsu, Peoples R China

[2] Hefei Univ Technol, Minist Educ, Key Lab Knowledge Engn Big Data, Hefei 23009, Peoples R China

[3] Mininglamp Acad Sci, Mininglamp, Beijing 100089, Peoples R China

来源：

FRONTIERS OF COMPUTER SCIENCE | 2023年 / 17卷 / 01期

基金：

中国国家自然科学基金;

关键词：

text simplification; pre-trained language modeling; BERT; word embeddings;

D O I：

10.1007/s11704-022-1244-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.

引用

页数：10

共 50 条

[31] Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
Hans Christian
Derwin Suhartono
Andry Chowanda
Kamal Z. Zamli
Journal of Big Data, 8
[32] Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models
Izsak, Peter
Guskin, Shira
Wasserblat, Moshe
FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 44 - 47
[33] What does the language system look like in pre-trained language models? A study using complex networks
Zheng, Jianyu
KNOWLEDGE-BASED SYSTEMS, 2024, 299
[34] Comparing pre-trained language models for Spanish hate speech detection
Miriam Plaza-del-Arco, Flor
Dolores Molina-Gonzalez, M.
Alfonso Urena-Lopez, L.
Teresa Martin-Valdivia, M.
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
[35] Detection of Chinese Deceptive Reviews Based on Pre-Trained Language Model
Weng, Chia-Hsien
Lin, Kuan-Cheng
Ying, Jia-Ching
APPLIED SCIENCES-BASEL, 2022, 12 (07):
[36] Enhancing pre-trained language models with Chinese character morphological knowledge
Zheng, Zhenzhong
Wu, Xiaoming
Liu, Xiangzhi
INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
[37] PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter
Kawintiranon, Kornraphop
Singh, Lisa
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7360 - 7367
[38] Question-answering Forestry Pre-trained Language Model: ForestBERT
Tan, Jingwei
Zhang, Huaiqing
Liu, Yang
Yang, Jie
Zheng, Dongping
Linye Kexue/Scientia Silvae Sinicae, 2024, 60 (09): : 99 - 110
[39] Effects of Pre-trained Word Embeddings on Text-based Deception Detection
Nam, David
Yasmin, Jerin
Zulkernine, Farhana
2020 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2020, : 437 - 443
[40] Using Large Pre-Trained Language Model to Assist FDA in Premarket Medical Device Classification
Xu, Zongzhe
SOUTHEASTCON 2023, 2023, : 159 - 166

← 1 2 3 4 5 →