Natural language generation from Universal Dependencies using data augmentation and pre-trained language models

被引:0
|
作者
Nguyen D.T. [1 ]
Tran T. [1 ]
机构
[1] Saigon University, Ho Chi Minh City
关键词
data augmentation; data-to-text generation; deep learning; fine-tune; pre-trained language models; sequence-to-sequence models; Universal Dependencies;
D O I
10.1504/IJIIDS.2023.10053426
中图分类号
学科分类号
摘要
Natural language generation (NLG) has focused on data-to-text tasks with different structured inputs in recent years. The generated text should contain given information, be grammatically correct, and meet other criteria. We propose in this research an approach that combines solid pre-trained language models with input data augmentation. The studied data in this work are Universal Dependencies (UDs) which is developed as a framework for consistent annotation of grammar (parts of speech, morphological features and syntactic dependencies) for cross-lingual learning. We study the English UD structures, which are modified into two groups. In the first group, the modification phase is to remove the order information of each word and lemmatise the tokens. In the second group, the modification phase is to remove the functional words and surface-oriented morphological details. With both groups of modified structures, we apply the same approach to explore how pre-trained sequence-to-sequence models text-to-text transfer transformer (T5) and BART perform on the training data. We augment the training data by creating several permutations for each input structure. The result shows that our approach can generate good quality English text with the exciting idea of studying strategies to represent UD inputs. Copyright © 2023 Inderscience Enterprises Ltd.
引用
收藏
页码:89 / 105
页数:16
相关论文
共 50 条
  • [31] Rethinking Textual Adversarial Defense for Pre-Trained Language Models
    Wang, Jiayi
    Bao, Rongzhou
    Zhang, Zhuosheng
    Zhao, Hai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2526 - 2540
  • [32] Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries
    Al-Kaswan, Ali
    Ahmed, Toufique
    Izadi, Maliheh
    Sawant, Anand Ashok
    Devanbu, Premkumar
    van Deursen, Arie
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 260 - 271
  • [33] Entity Linking of Sound Recordings and Compositions with Pre-trained Language Models
    Katakis, Nikiforos
    Vikatos, Pantelis
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES (WEBIST), 2021, : 474 - 481
  • [34] Pre-trained language models with domain knowledge for biomedical extractive summarization
    Xie Q.
    Bishop J.A.
    Tiwari P.
    Ananiadou S.
    Knowledge-Based Systems, 2022, 252
  • [35] Improving Braille-Chinese translation with jointly trained and pre-trained language models
    Huang, Tianyuan
    Su, Wei
    Liu, Lei
    Cai, Chuan
    Yu, Hailong
    Yuan, Yongna
    DISPLAYS, 2024, 82
  • [36] A Transformer Based Approach To Detect Suicidal Ideation Using Pre-Trained Language Models
    Haque, Farsheed
    Nur, Ragib Un
    Al Jahan, Shaeekh
    Mahmud, Zarar
    Shah, Faisal Muhammad
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [37] Identifying Styles of Cross-Language Classics with Pre-Trained Models
    Zhang Y.
    Deng S.
    Hu H.
    Wang D.
    Data Analysis and Knowledge Discovery, 2023, 7 (10) : 50 - 62
  • [38] SMT Solver Validation Empowered by Large Pre-trained Language Models
    Sun, Maolin
    Yang, Yibiao
    Wang, Yang
    Wen, Ming
    Jia, Haoxiang
    Zhou, Yuming
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 1288 - 1300
  • [39] Gauging, enriching and applying geography knowledge in Pre-trained Language Models
    Ramrakhiyani, Nitin
    Varma, Vasudeva
    Palshikar, Girish Keshav
    Pawar, Sachin
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
  • [40] Intelligent Completion of Ancient Texts Based on Pre-trained Language Models
    Li J.
    Ming C.
    Guo Z.
    Qian T.
    Peng Z.
    Wang X.
    Li X.
    Li J.
    Data Analysis and Knowledge Discovery, 2024, 8 (05) : 59 - 67