Unsupervised statistical text simplification using pre-trained language modeling for initialization

被引:9
|
作者
Qiang, Jipeng [1 ]
Zhang, Feng [1 ]
Li, Yun [1 ]
Yuan, Yunhao [1 ]
Zhu, Yi [1 ]
Wu, Xindong [2 ,3 ]
机构
[1] Yangzhou Univ, Dept Comp Sci, Yangzhou 225127, Jiangsu, Peoples R China
[2] Hefei Univ Technol, Minist Educ, Key Lab Knowledge Engn Big Data, Hefei 23009, Peoples R China
[3] Mininglamp Acad Sci, Mininglamp, Beijing 100089, Peoples R China
基金
中国国家自然科学基金;
关键词
text simplification; pre-trained language modeling; BERT; word embeddings;
D O I
10.1007/s11704-022-1244-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging
    Hans Christian
    Derwin Suhartono
    Andry Chowanda
    Kamal Z. Zamli
    Journal of Big Data, 8
  • [32] Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models
    Izsak, Peter
    Guskin, Shira
    Wasserblat, Moshe
    FIFTH WORKSHOP ON ENERGY EFFICIENT MACHINE LEARNING AND COGNITIVE COMPUTING - NEURIPS EDITION (EMC2-NIPS 2019), 2019, : 44 - 47
  • [33] What does the language system look like in pre-trained language models? A study using complex networks
    Zheng, Jianyu
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [34] Comparing pre-trained language models for Spanish hate speech detection
    Miriam Plaza-del-Arco, Flor
    Dolores Molina-Gonzalez, M.
    Alfonso Urena-Lopez, L.
    Teresa Martin-Valdivia, M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
  • [35] Detection of Chinese Deceptive Reviews Based on Pre-Trained Language Model
    Weng, Chia-Hsien
    Lin, Kuan-Cheng
    Ying, Jia-Ching
    APPLIED SCIENCES-BASEL, 2022, 12 (07):
  • [36] Enhancing pre-trained language models with Chinese character morphological knowledge
    Zheng, Zhenzhong
    Wu, Xiaoming
    Liu, Xiangzhi
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
  • [37] PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter
    Kawintiranon, Kornraphop
    Singh, Lisa
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7360 - 7367
  • [38] Question-answering Forestry Pre-trained Language Model: ForestBERT
    Tan, Jingwei
    Zhang, Huaiqing
    Liu, Yang
    Yang, Jie
    Zheng, Dongping
    Linye Kexue/Scientia Silvae Sinicae, 2024, 60 (09): : 99 - 110
  • [39] Effects of Pre-trained Word Embeddings on Text-based Deception Detection
    Nam, David
    Yasmin, Jerin
    Zulkernine, Farhana
    2020 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2020, : 437 - 443
  • [40] Using Large Pre-Trained Language Model to Assist FDA in Premarket Medical Device Classification
    Xu, Zongzhe
    SOUTHEASTCON 2023, 2023, : 159 - 166