Natural language generation from Universal Dependencies using data augmentation and pre-trained language models

被引：0

作者：

Nguyen D.T. ^{[1
]}

Tran T. ^{[1
]}

机构：

[1] Saigon University, Ho Chi Minh City

来源：

International Journal of Intelligent Information and Database Systems | 2023年 / 16卷 / 01期

关键词：

data augmentation; data-to-text generation; deep learning; fine-tune; pre-trained language models; sequence-to-sequence models; Universal Dependencies;

D O I：

10.1504/IJIIDS.2023.10053426

中图分类号：

学科分类号：

摘要：

Natural language generation (NLG) has focused on data-to-text tasks with different structured inputs in recent years. The generated text should contain given information, be grammatically correct, and meet other criteria. We propose in this research an approach that combines solid pre-trained language models with input data augmentation. The studied data in this work are Universal Dependencies (UDs) which is developed as a framework for consistent annotation of grammar (parts of speech, morphological features and syntactic dependencies) for cross-lingual learning. We study the English UD structures, which are modified into two groups. In the first group, the modification phase is to remove the order information of each word and lemmatise the tokens. In the second group, the modification phase is to remove the functional words and surface-oriented morphological details. With both groups of modified structures, we apply the same approach to explore how pre-trained sequence-to-sequence models text-to-text transfer transformer (T5) and BART perform on the training data. We augment the training data by creating several permutations for each input structure. The result shows that our approach can generate good quality English text with the exciting idea of studying strategies to represent UD inputs. Copyright © 2023 Inderscience Enterprises Ltd.

引用

页码：89 / 105

页数：16

共 50 条

[1] Pre-trained models for natural language processing: A survey
Qiu XiPeng
Sun TianXiang
Xu YiGe
Shao YunFan
Dai Ning
Huang XuanJing
SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2020, 63 (10) : 1872 - 1897
[2] Pre-trained models for natural language processing: A survey
XiPeng Qiu
TianXiang Sun
YiGe Xu
YunFan Shao
Ning Dai
XuanJing Huang
Science China Technological Sciences, 2020, 63 : 1872 - 1897
[3] Pre-Trained Language Models for Text Generation: A Survey
Li, Junyi
Tang, Tianyi
Zhao, Wayne Xin
Nie, Jian-Yun
Wen, Ji-Rong
ACM COMPUTING SURVEYS, 2024, 56 (09)
[4] Data Augmentation Based on Pre-trained Language Model for Event Detection
Zhang, Meng
Xie, Zhiwen
Liu, Jin
CCKS 2021 - EVALUATION TRACK, 2022, 1553 : 59 - 68
[5] On the Effectiveness of Pre-Trained Language Models for Legal Natural Language Processing: An Empirical Study
Song, Dezhao
Gao, Sally
He, Baosheng
Schilder, Frank
IEEE ACCESS, 2022, 10 : 75835 - 75858
[6] Capturing Semantics for Imputation with Pre-trained Language Models
Mei, Yinan
Song, Shaoxu
Fang, Chenguang
Yang, Haifeng
Fang, Jingyun
Long, Jiang
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 61 - 72
[7] Issue Report Classification Using Pre-trained Language Models
Colavito, Giuseppe
Lanubile, Filippo
Novielli, Nicole
2022 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING (NLBSE 2022), 2022, : 29 - 32
[8] Mitigating social biases of pre-trained language models via contrastive self-debiasing with double data augmentation
Li, Yingji
Du, Mengnan
Song, Rui
Wang, Xin
Sun, Mingchen
Wang, Ying
ARTIFICIAL INTELLIGENCE, 2024, 332
[9] LMCK: pre-trained language models enhanced with contextual knowledge for Vietnamese natural language inference
Ngan Luu-Thuy Nguyen
Khoa Thi-Kim Phan
Tin Van Huynh
Kiet Van Nguyen
Multimedia Tools and Applications, 2025, 84 (15) : 15575 - 15595
[10] Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models
Ott, Harold
Bogatinovski, Jasmin
Acker, Alexander
Nedelkoski, Sasho
Kao, Odej
2021 IEEE/ACM INTERNATIONAL WORKSHOP ON CLOUD INTELLIGENCE (CLOUDINTELLIGENCE 2021), 2021, : 19 - 24

← 1 2 3 4 5 →