Using back-and-forth translation to create artificial augmented textual data for sentiment analysis models

被引：14

作者：

Body, Thomas ^{[1
]}

Tao, Xiaohui ^{[1
]}

Li, Yuefeng ^{[2
]}

Li, Lin ^{[3
]}

Zhong, Ning ^{[4
]}

机构：

[1] Univ Southern Queensland, Sch Sci, Darling Hts, Qld, Australia

[2] Queensland Univ Technol, Sci & Engn Fac, Brisbane, Qld, Australia

[3] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China

[4] Maebashi Inst Technol, Dept Life Sci & Informat, Maebashi, Gumma, Japan

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2021年 / 178卷

基金：

日本学术振兴会;

关键词：

Natural language processing; Translation; Sentiment analysis; Data augmentation;

D O I：

10.1016/j.eswa.2021.115033

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sentiment analysis classification models trained using neural networks require large amounts of data, but collecting these datasets requires significant time and resources. Although artificial data has been used successfully in computer vision, there are few effective and generalizable methods for creating artificial augmented text data. In this paper, a text based data augmentation method is proposed called back-and-forth translation that can be used to artificially increase the size of any natural language dataset. By creating augmented text data and adding it to the original dataset, it is demonstrated by empirical experiments that back-and-forth translation data augmentation can reduce the error rate in binary sentiment classification models by up to 3.4%. These results are shown to be statistically significant.

引用

页数：12

共 45 条

[1] Bar D., 2015, TUDCS20150017
[2] Campos D. F., 2016, ARXIV161109268V3
[3] AutoAugment: Learning Augmentation Strategies from Data
Cubuk, Ekin D.
Zoph, Barret
Mane, Dandelion
Vasudevan, Vijay
Le, Quoc V.
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 113 - 123
[4] Data Augmentation for Low-Resource Neural Machine Translation
Fadaee, Marzieh
Bisazza, Arianna
Monz, Christof
[J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 567 - 573
[5] Fei H., 2020, P 58 ANN M ASS COMP, P5759, DOI DOI 10.18653/V1/2020.ACL-MAIN.510
[6] Galinsky R, 2016, PROCEEDINGS OF THE 2016 IEEE ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE CONFERENCE (AINL FRUCT 2016), P45
[7] Gao F, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P5539
[8] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[9] Convolutional Recurrent Deep Learning Model for Sentence Classification
Hassan, Abdalraouf
Mahmood, Ausif
[J]. IEEE ACCESS, 2018, 6 : 13949 - 13957
[10] Honnibal Matthew., 2015, Empirical methods in natural language processing (EMNLP), P1373, DOI 10.18653/v1/D15-1162

← 1 2 3 4 5 →