Natural language generation from Universal Dependencies using data augmentation and pre-trained language models

被引:0
|
作者
Nguyen D.T. [1 ]
Tran T. [1 ]
机构
[1] Saigon University, Ho Chi Minh City
关键词
data augmentation; data-to-text generation; deep learning; fine-tune; pre-trained language models; sequence-to-sequence models; Universal Dependencies;
D O I
10.1504/IJIIDS.2023.10053426
中图分类号
学科分类号
摘要
Natural language generation (NLG) has focused on data-to-text tasks with different structured inputs in recent years. The generated text should contain given information, be grammatically correct, and meet other criteria. We propose in this research an approach that combines solid pre-trained language models with input data augmentation. The studied data in this work are Universal Dependencies (UDs) which is developed as a framework for consistent annotation of grammar (parts of speech, morphological features and syntactic dependencies) for cross-lingual learning. We study the English UD structures, which are modified into two groups. In the first group, the modification phase is to remove the order information of each word and lemmatise the tokens. In the second group, the modification phase is to remove the functional words and surface-oriented morphological details. With both groups of modified structures, we apply the same approach to explore how pre-trained sequence-to-sequence models text-to-text transfer transformer (T5) and BART perform on the training data. We augment the training data by creating several permutations for each input structure. The result shows that our approach can generate good quality English text with the exciting idea of studying strategies to represent UD inputs. Copyright © 2023 Inderscience Enterprises Ltd.
引用
收藏
页码:89 / 105
页数:16
相关论文
共 50 条
  • [41] A Brief Review of Relation Extraction Based on Pre-Trained Language Models
    Xu, Tiange
    Zhang, Fu
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 775 - 789
  • [42] Text data augmentation and pre-trained Language Model for enhancing text classification of low-resource languages
    Ziyaden, Atabay
    Yelenov, Amir
    Hajiyev, Fuad
    Rustamov, Samir
    Pak, Alexandr
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [43] A survey of text classification based on pre-trained language model
    Wu, Yujia
    Wan, Jun
    NEUROCOMPUTING, 2025, 616
  • [44] Recommending metamodel concepts during modeling activities with pre-trained language models
    Martin Weyssow
    Houari Sahraoui
    Eugene Syriani
    Software and Systems Modeling, 2022, 21 : 1071 - 1089
  • [45] Evaluating Embeddings from Pre-Trained Language Models and Knowledge Graphs for Educational Content Recommendation
    Li, Xiu
    Henriksson, Aron
    Duneld, Martin
    Nouri, Jalal
    Wu, Yongchao
    FUTURE INTERNET, 2024, 16 (01)
  • [46] Automatic Fixation of Decompilation Quirks Using Pre-trained Language Model
    Kaichi, Ryunosuke
    Matsumoto, Shinsuke
    Kusumoto, Shinji
    PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2023, PT I, 2024, 14483 : 259 - 266
  • [47] Leveraging pre-trained language models for mining microbiome-disease relationships
    Nikitha Karkera
    Sathwik Acharya
    Sucheendra K. Palaniappan
    BMC Bioinformatics, 24
  • [48] Leveraging pre-trained language models for mining microbiome-disease relationships
    Karkera, Nikitha
    Acharya, Sathwik
    Palaniappan, Sucheendra K.
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [49] Pre-trained SAM as data augmentation for image segmentation
    Wu, Junjun
    Rao, Yunbo
    Zeng, Shaoning
    Zhang, Bob
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2025, 10 (01) : 268 - 282
  • [50] On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages
    Chen, Fuxiang
    Fard, Fatemeh H.
    Lo, David
    Bryksin, Timofey
    30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 401 - 412