Syntactically Coherent Text Augmentation for Sequence Classification

被引:7
作者
Pandey, Suraj [1 ]
Akhtar, Md. Shad [1 ]
Chakraborty, Tanmoy [1 ]
机构
[1] Indraprastha Inst Informat Technol Delhi, Dept Comp Sci & Engn, New Delhi 110020, India
关键词
Generators; Task analysis; Syntactics; Computational modeling; Training; Computer architecture; Data models; Data augmentation; generative adversarial network (GAN); sequence classification; SENTIMENT ANALYSIS; NETWORK;
D O I
10.1109/TCSS.2021.3075774
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we address the problem of data scarcity for the sequence classification tasks. We propose AugmentGAN, a simple-yet-effective generative adversarial network-based text augmentation model, which ensures syntactic coherency in the newly generated samples. Given an input with a label, AugmentGAN aims to generate a semantically similar sequence that follows the syntactic structure of the original sample. Exhaustive task-based evaluation is conducted to show the efficacy of AugmentGAN-we employ 12 different datasets across five classification tasks, i.e., sentiment analysis, emotion recognition, sarcasm detection, intent classification, and spam detection. We observe that, compared to the existing text augmentation techniques, AugmentGAN yields an improved performance across datasets for all the tasks. AugmentGAN also turns out to be effective for multiple languages, i.e., English, Hindi, and Bengali.
引用
收藏
页码:1323 / 1332
页数:10
相关论文
共 63 条
[51]   A Hybrid Approach for Aspect-Based Sentiment Analysis Using Deep Contextual Word Embeddings and Hierarchical Attention [J].
Trusca, Maria Mihaela ;
Wassenberg, Daan ;
Frasincar, Flavius ;
Dekker, Rommert .
WEB ENGINEERING, ICWE 2020, 2020, 12128 :365-380
[52]  
van der Maaten L, 2008, J MACH LEARN RES, V9, P2579
[53]   A Hybrid Approach for Aspect-Based Sentiment Analysis Using a Lexicalized Domain Ontology and Attentional Neural Models [J].
Wallaart, Olaf ;
Frasincar, Flavius .
SEMANTIC WEB, ESWC 2019, 2019, 11503 :363-378
[54]  
Wang William Yang, 2015, P 2015 C EMP METH NA, P2557, DOI [10.18653/v1/D15-1306, DOI 10.18653/V1/D15-1306]
[55]  
Wei J, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P6382
[56]  
Xu JJ, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3940
[57]  
Yadav S., 2020, ARXIV200111384
[58]  
Yu LT, 2017, AAAI CONF ARTIF INTE, P2852
[59]  
Zhang X, 2015, ADV NEUR IN, V28
[60]  
Zhang Y, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P3221