Unsupervised statistical text simplification using pre-trained language modeling for initialization

被引:9
|
作者
Qiang, Jipeng [1 ]
Zhang, Feng [1 ]
Li, Yun [1 ]
Yuan, Yunhao [1 ]
Zhu, Yi [1 ]
Wu, Xindong [2 ,3 ]
机构
[1] Yangzhou Univ, Dept Comp Sci, Yangzhou 225127, Jiangsu, Peoples R China
[2] Hefei Univ Technol, Minist Educ, Key Lab Knowledge Engn Big Data, Hefei 23009, Peoples R China
[3] Mininglamp Acad Sci, Mininglamp, Beijing 100089, Peoples R China
基金
中国国家自然科学基金;
关键词
text simplification; pre-trained language modeling; BERT; word embeddings;
D O I
10.1007/s11704-022-1244-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unsupervised text simplification has attracted much attention due to the scarcity of high-quality parallel text simplification corpora. Recent an unsupervised statistical text simplification based on phrase-based machine translation system (UnsupPBMT) achieved good performance, which initializes the phrase tables using the similar words obtained by word embedding modeling. Since word embedding modeling only considers the relevance between words, the phrase table in UnsupPBMT contains a lot of dissimilar words. In this paper, we propose an unsupervised statistical text simplification using pre-trained language modeling BERT for initialization. Specifically, we use BERT as a general linguistic knowledge base for predicting similar words. Experimental results show that our method outperforms the state-of-the-art unsupervised text simplification methods on three benchmarks, even outperforms some supervised baselines.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Unsupervised statistical text simplification using pre-trained language modeling for initialization
    Jipeng Qiang
    Feng Zhang
    Yun Li
    Yunhao Yuan
    Yi Zhu
    Xindong Wu
    Frontiers of Computer Science, 2023, 17
  • [2] Unsupervised statistical text simplification using pre-trained language modeling for initialization
    QIANG Jipeng
    ZHANG Feng
    LI Yun
    YUAN Yunhao
    ZHU Yi
    WU Xindong
    Frontiers of Computer Science, 2023, 17 (01)
  • [3] Extremely Low Resource Text simplification with Pre-trained Transformer Language Model
    Maruyama, Takumi
    Yamamoto, Kazuhide
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 53 - 58
  • [4] Unsupervised Statistical Text Simplification
    Qiang, Jipeng
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (04) : 1802 - 1806
  • [5] RoBERTuito: a pre-trained language model for social media text in Spanish
    Manuel Perez, Juan
    Furman, Damian A.
    Alonso Alemany, Laura
    Luque, Franco
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7235 - 7243
  • [6] Leveraging Pre-Trained Language Model for Summary Generation on Short Text
    Zhao, Shuai
    You, Fucheng
    Liu, Zeng Yuan
    IEEE ACCESS, 2020, 8 : 228798 - 228803
  • [7] Using a Pre-Trained Language Model for Medical Named Entity Extraction in Chinese Clinic Text
    Zhang, Mengyuan
    Wang, Jin
    Zhang, Xuejie
    PROCEEDINGS OF 2020 IEEE 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2020), 2020, : 312 - 317
  • [8] Issue Report Classification Using Pre-trained Language Models
    Colavito, Giuseppe
    Lanubile, Filippo
    Novielli, Nicole
    2022 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING (NLBSE 2022), 2022, : 29 - 32
  • [9] A deep connection to Khasi language through pre-trained embedding
    Thabah, N. Donald Jefferson
    Mitri, Aiom Minnette
    Saha, Goutam
    Maji, Arnab Kumar
    Purkayastha, Bipul Shyam
    INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2025, 21 (01) : 179 - 193
  • [10] Pre-trained language models in medicine: A survey *
    Luo, Xudong
    Deng, Zhiqi
    Yang, Binxia
    Luo, Michael Y.
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154