Using back-and-forth translation to create artificial augmented textual data for sentiment analysis models

被引:14
作者
Body, Thomas [1 ]
Tao, Xiaohui [1 ]
Li, Yuefeng [2 ]
Li, Lin [3 ]
Zhong, Ning [4 ]
机构
[1] Univ Southern Queensland, Sch Sci, Darling Hts, Qld, Australia
[2] Queensland Univ Technol, Sci & Engn Fac, Brisbane, Qld, Australia
[3] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China
[4] Maebashi Inst Technol, Dept Life Sci & Informat, Maebashi, Gumma, Japan
基金
日本学术振兴会;
关键词
Natural language processing; Translation; Sentiment analysis; Data augmentation;
D O I
10.1016/j.eswa.2021.115033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis classification models trained using neural networks require large amounts of data, but collecting these datasets requires significant time and resources. Although artificial data has been used successfully in computer vision, there are few effective and generalizable methods for creating artificial augmented text data. In this paper, a text based data augmentation method is proposed called back-and-forth translation that can be used to artificially increase the size of any natural language dataset. By creating augmented text data and adding it to the original dataset, it is demonstrated by empirical experiments that back-and-forth translation data augmentation can reduce the error rate in binary sentiment classification models by up to 3.4%. These results are shown to be statistically significant.
引用
收藏
页数:12
相关论文
共 45 条
  • [1] Bar D., 2015, TUDCS20150017
  • [2] Campos D. F., 2016, ARXIV161109268V3
  • [3] AutoAugment: Learning Augmentation Strategies from Data
    Cubuk, Ekin D.
    Zoph, Barret
    Mane, Dandelion
    Vasudevan, Vijay
    Le, Quoc V.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 113 - 123
  • [4] Data Augmentation for Low-Resource Neural Machine Translation
    Fadaee, Marzieh
    Bisazza, Arianna
    Monz, Christof
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 567 - 573
  • [5] Fei H., 2020, P 58 ANN M ASS COMP, P5759, DOI DOI 10.18653/V1/2020.ACL-MAIN.510
  • [6] Galinsky R, 2016, PROCEEDINGS OF THE 2016 IEEE ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE CONFERENCE (AINL FRUCT 2016), P45
  • [7] Gao F, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P5539
  • [8] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
  • [9] Convolutional Recurrent Deep Learning Model for Sentence Classification
    Hassan, Abdalraouf
    Mahmood, Ausif
    [J]. IEEE ACCESS, 2018, 6 : 13949 - 13957
  • [10] Honnibal Matthew., 2015, Empirical methods in natural language processing (EMNLP), P1373, DOI 10.18653/v1/D15-1162