Unsupervised statistical text simplification using pre-trained language modeling for initialization

被引：9

作者：

Qiang, Jipeng ^{[1
]}

Zhang, Feng ^{[1
]}

Li, Yun ^{[1
]}

Yuan, Yunhao ^{[1
]}

Zhu, Yi ^{[1
]}

Wu, Xindong ^{[2
,3
]}

机构：

[1] Yangzhou Univ, Dept Comp Sci, Yangzhou 225127, Jiangsu, Peoples R China

[2] Hefei Univ Technol, Minist Educ, Key Lab Knowledge Engn Big Data, Hefei 23009, Peoples R China

[3] Mininglamp Acad Sci, Mininglamp, Beijing 100089, Peoples R China

来源：

FRONTIERS OF COMPUTER SCIENCE | 2023年 / 17卷 / 01期

基金：

中国国家自然科学基金;

关键词：

text simplification; pre-trained language modeling; BERT; word embeddings;

D O I：

10.1007/s11704-022-1244-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.

引用

页数：10

共 50 条

[1] Unsupervised statistical text simplification using pre-trained language modeling for initialization
Jipeng Qiang
Feng Zhang
Yun Li
Yunhao Yuan
Yi Zhu
Xindong Wu
Frontiers of Computer Science, 2023, 17
[2] Unsupervised statistical text simplification using pre-trained language modeling for initialization
QIANG Jipeng
ZHANG Feng
LI Yun
YUAN Yunhao
ZHU Yi
WU Xindong
Frontiers of Computer Science, 2023, 17 (01)
[3] Extremely Low Resource Text simplification with Pre-trained Transformer Language Model
Maruyama, Takumi
Yamamoto, Kazuhide
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 53 - 58
[4] Unsupervised Statistical Text Simplification
Qiang, Jipeng
Wu, Xindong
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (04) : 1802 - 1806
[5] RoBERTuito: a pre-trained language model for social media text in Spanish
Manuel Perez, Juan
Furman, Damian A.
Alonso Alemany, Laura
Luque, Franco
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7235 - 7243
[6] Leveraging Pre-Trained Language Model for Summary Generation on Short Text
Zhao, Shuai
You, Fucheng
Liu, Zeng Yuan
IEEE ACCESS, 2020, 8 : 228798 - 228803
[7] Using a Pre-Trained Language Model for Medical Named Entity Extraction in Chinese Clinic Text
Zhang, Mengyuan
Wang, Jin
Zhang, Xuejie
PROCEEDINGS OF 2020 IEEE 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2020), 2020, : 312 - 317
[8] Issue Report Classification Using Pre-trained Language Models
Colavito, Giuseppe
Lanubile, Filippo
Novielli, Nicole
2022 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING (NLBSE 2022), 2022, : 29 - 32
[9] A deep connection to Khasi language through pre-trained embedding
Thabah, N. Donald Jefferson
Mitri, Aiom Minnette
Saha, Goutam
Maji, Arnab Kumar
Purkayastha, Bipul Shyam
INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2025, 21 (01) : 179 - 193
[10] Pre-trained language models in medicine: A survey *
Luo, Xudong
Deng, Zhiqi
Yang, Binxia
Luo, Michael Y.
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154

← 1 2 3 4 5 →