Dual Consistency-Enhanced Semi-Supervised Sentiment Analysis Towards COVID-19 Tweets

被引:4
作者
Sun, Teng [1 ]
Jing, Liqiang [1 ]
Wei, Yinwei [2 ]
Song, Xuemeng [1 ]
Cheng, Zhiyong [3 ]
Nie, Liqiang [4 ]
机构
[1] Shandong Univ, Dept Comp Sci & Technol, Qingdao 266237, Shandong, Peoples R China
[2] Natl Univ Singapore, Sch Comp, Singapore 119077, Singapore
[3] Qilu Univ Technol, Shandong Artificial Intelligence Inst, Shandong Acad Sci, Jinan 250316, Shandong, Peoples R China
[4] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Index Terms-Semi-supervised text classification; sentiment analysis; social media dataset on COVID-19;
D O I
10.1109/TKDE.2023.3270940
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the context of COVID-19, numerous people present their opinions through social networks. It is thus highly desired to conduct sentiment analysis towards COVID-19 tweets to learn the public's attitudes, and facilitate the government to make proper guidelines for avoiding the social unrest. Although many efforts have studied the text-based sentiment classification from various domains (e.g., delivery and shopping reviews), it is hard to directly use these classifiers for the sentiment analysis towards COVID-19 tweets due to the domain gap. In fact, developing the sentiment classifier for COVID-19 tweets is mainly challenged by the limited annotated training dataset, as well as the diverse and informal expressions of user-generated posts. To address these challenges, we construct a large-scale COVID-19 dataset from Weibo and propose a dual COnsistency-enhanced semi-superVIseD network for Sentiment Anlaysis (COVID-SA). In particular, we first introduce a knowledge-based augmentation method to augment data and enhance the model's robustness. We then employ BERT as the text encoder backbone for both labeled data, unlabeled data, and augmented data. Moreover, we propose a dual consistency (i.e., label-oriented consistency and instance-oriented consistency) regularization to promote the model performance. Extensive experiments on our self-constructed dataset and three public datasets show the superiority of COVID-SA over state-of-the-art baselines on various applications.
引用
收藏
页码:12605 / 12617
页数:13
相关论文
共 53 条
  • [1] Adaptive Consistency Regularization for Semi-Supervised Transfer Learning
    Abuduweili, Abulikemu
    Li, Xingjian
    Shi, Humphrey
    Xu, Cheng-Zhong
    Dou, Dejing
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6919 - 6928
  • [2] Ahmed H.M., 2021, Ilkogretim Online, V20, P827
  • [3] Chang M.W., 2008, AAAI, V2, P830
  • [4] Chen JA, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P2147
  • [5] Chen Jiaao, 2020, P 3 WORKSHOP AFFECTI, P151
  • [6] CN-Probase: A Data-driven Approach for Large-scale Chinese Taxonomy Construction
    Chen, Jindong
    Wang, Ao
    Chen, Jiangjie
    Xiao, Yanghua
    Chu, Zhendong
    Liu, Jingping
    Liang, Jiaqing
    Wang, Wei
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1706 - 1709
  • [7] Chen MD, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P215
  • [8] Towards Knowledge-Based Personalized Product Description Generation in E-commerce
    Chen, Qibin
    Lin, Junyang
    Zhang, Yichang
    Yang, Hongxia
    Zhou, Jingren
    Tang, Jie
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 3040 - 3050
  • [9] Chen Ting, 2019, 25 AMERICAS C INFORM
  • [10] Dai N, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P5997