Lexical data augmentation for sentiment analysis

被引:22
|
作者
Xiang, Rong [1 ]
Chersoni, Emmanuele [1 ]
Lu, Qin [1 ]
Huang, Chu-Ren [1 ]
Li, Wenjie [1 ]
Long, Yunfei [2 ]
机构
[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China
[2] Univ Essex, Colchester, Essex, England
关键词
Compilation and indexing terms; Copyright 2025 Elsevier Inc;
D O I
10.1002/asi.24493
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.
引用
收藏
页码:1432 / 1447
页数:16
相关论文
共 50 条
  • [21] Data Augmentation for Sentiment Analysis Using Sentence Compression-Based SeqGAN With Data Screening
    Luo, Jiawei
    Bouazizi, Mondher
    Ohtsuki, Tomoaki
    IEEE ACCESS, 2021, 9 : 99922 - 99931
  • [22] Data augmentation for sentiment classification with semantic preservation and diversity
    Chao, Guoqing
    Liu, Jingyao
    Wang, Mingyu
    Chu, Dianhui
    KNOWLEDGE-BASED SYSTEMS, 2023, 280
  • [23] Reinforced Counterfactual Data Augmentation for Dual Sentiment Classification
    Chen, Hao
    Xia, Rui
    Yu, Jianfei
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 269 - 278
  • [24] SENTIMENT CLASSIFICATION OF UNSTRUCTURED DATA USING LEXICAL BASED TECHNIQUES
    Shamsudin, Nurul Fathiyah
    Basiron, Halizah
    Saaya, Zurina
    Rahman, Ahmad Fadzli Nizam Abdul
    Zakaria, Mohd Hafiz
    Hassim, Nurulhalim
    JURNAL TEKNOLOGI, 2015, 77 (18): : 113 - 120
  • [25] Guide for the application of the data augmentation approach on sets of texts in Spanish for sentiment and emotion analysis
    Benitez, Rodrigo Gutierrez
    Navarrete, Alejandra Segura
    Vidal-Castro, Christian
    Martinez-Araneda, Claudia
    PLOS ONE, 2024, 19 (09):
  • [26] Text Sentiment Analysis Based on Transformer and Augmentation
    Gong, Xiaokang
    Ying, Wenhao
    Zhong, Shan
    Gong, Shengrong
    FRONTIERS IN PSYCHOLOGY, 2022, 13
  • [27] Integration of Lexical and Semantic Knowledge for Sentiment Analysis in SMS
    Khiari, Wejdene
    Bouhafs, Asma
    Roche, Mathieu
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1185 - 1189
  • [28] Understanding Customer Sentiment: Lexical Analysis of Restaurant Reviews
    Ara, Jinat
    Hasan, Md Toufique
    Al Omar, Abdullah
    Bhuiyan, Lianif
    2020 IEEE REGION 10 SYMPOSIUM (TENSYMP) - TECHNOLOGY FOR IMPACTFUL SUSTAINABLE DEVELOPMENT, 2020, : 295 - 299
  • [29] Chinese Lexical based Sentiment Analysis Framework in Meteorology
    Li, Yinan
    Zhang, Fuquan
    Zhu, Yifan
    Zhang, Sifan
    Mao, Yu
    Niu, Zhendong
    2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1652 - 1658
  • [30] Marathi SentiWordNet: A lexical resource for sentiment analysis of Marathi
    Shelke, Mahesh B.
    Sawant, Daivat D.
    Kadam, Chatrabhuj B.
    Ambhure, Kailas
    Deshmukh, Sachin N.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (02):