Lexical data augmentation for sentiment analysis

被引:22
|
作者
Xiang, Rong [1 ]
Chersoni, Emmanuele [1 ]
Lu, Qin [1 ]
Huang, Chu-Ren [1 ]
Li, Wenjie [1 ]
Long, Yunfei [2 ]
机构
[1] Hong Kong Polytech Univ, Hong Kong, Peoples R China
[2] Univ Essex, Colchester, Essex, England
关键词
Compilation and indexing terms; Copyright 2025 Elsevier Inc;
D O I
10.1002/asi.24493
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.
引用
收藏
页码:1432 / 1447
页数:16
相关论文
共 50 条
  • [41] An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis
    Mehmood, Khawar
    Essam, Daryl
    Shafi, Kamran
    Malik, Muhammad Kamran
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
  • [42] Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification
    Melville, Prem
    Gryc, Wojciech
    Lawrence, Richard D.
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 1275 - 1283
  • [43] Sentiment Analysis of Public Complaints Using Lexical Resources Between Indonesian Sentiment Lexicon and Sentiwordnet
    Lailiyah, M.
    Sumpeno, S.
    Purnama, I. K. E.
    2017 INTERNATIONAL SEMINAR ON INTELLIGENT TECHNOLOGY AND ITS APPLICATIONS (ISITIA), 2017, : 307 - 312
  • [44] Multi-Task Learning Model with Data Augmentation for Arabic Aspect-Based Sentiment Analysis
    Fadel, Arwa Saif
    Abulnaja, Osama Ahmed
    Saleh, Mostafa Elsayed
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 4419 - 4444
  • [45] Enhancing aspect-based sentiment analysis using data augmentation based on back-translation
    Taheri, Alireza
    Zamanifar, Azadeh
    Farhadi, Amirfarhad
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, : 491 - 516
  • [46] Aspect sentiment triplet extraction based on data augmentation and task feedback
    Liu, Shu
    Lu, Tingting
    Li, Kaiwen
    Liu, Weihua
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, : 1659 - 1683
  • [47] Sentiment Analysis on Weibo Data
    Li, Di
    Niu, Jianwei
    Qiu, Meikang
    Liu, Meiqin
    2014 IEEE COMPUTING, COMMUNICATIONS AND IT APPLICATIONS CONFERENCE (COMCOMAP), 2014, : 249 - 254
  • [48] Sentiment Analysis of Twitter Data
    Desai, Radhi D.
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 114 - 117
  • [49] Sentiment Analysis of Customer Data
    Grljevic, Olivera
    Bosnjak, Zita
    STRATEGIC MANAGEMENT, 2018, 23 (03): : 38 - 49
  • [50] Sentiment analysis of customer data
    Tarnowska, Katarzyna A.
    Ras, Zbigniew W.
    WEB INTELLIGENCE, 2019, 17 (04) : 343 - 363