Graph-aware pre-trained language model for political sentiment analysis in Filipino social media

被引:0
作者
Aquino, Jean Aristide [1 ]
Liew, Di Jie [1 ]
Chang, Yung-Chun [1 ,2 ]
机构
[1] Taipei Med Univ, Grad Inst Data Sci, Taipei, Taiwan
[2] Taipei Med Univ Hosp, Clin Big Data Res Ctr, Taipei, Taiwan
关键词
Politics; Sentiment analysis; Social media analytics; Pre-trained language model; Graph convolution network; Topic modeling;
D O I
10.1016/j.engappai.2025.110317
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Elections are emotionally and sentimentally charged events that offer unique opportunities for analysis of sentiments not typically observed during non-election periods. Unlike recurring phenomena, elections are inherently singular events, with each election shaped by distinct political, social, and cultural contexts. In the digital age, social media has become a direct channel for politicians and political parties to engage with voters, making it a critical platform for sentiment analysis. However, challenges such as imbalanced datasets, the prevalence of noisy non-text elements (e.g., emojis, hashtags, user mentions), and the need for effective integration of graphbased learning remain significant hurdles in sentiment prediction. To address these challenges, we constructed an imbalanced dataset of 8035 manually annotated tweets and approximately 516,000 weakly labeled Filipino tweets related to the 2022 Philippine National Election. Leveraging these datasets, we designed a Bidirectional Encoder Representations from Transformers (BERT) and Graph Convolution Network (GCN) model, which uniquely incorporates emojis, hashtags, and user mentions as features to enhance semantic understanding. Differing from the prior literature that focused solely on textual data or discarded non-textual elements, our model integrates these features to achieve a robust performance that outperforms baseline models with a macrorecall score of 64.73% and a macro F1-score of 68.72% on the imbalanced dataset. Additionally, we introduce a topic modeling framework that combines BERT embeddings with Latent Dirichlet Allocation (LDA) and LogLikelihood Ratio (LLR) to yield more distinct topic clusters for deeper sentiment analysis. Our work therefore contributes two novel datasets in Filipino as well as methodologies that bridge sentiment prediction and analysis, and in so doing, provides valuable resources for future research.
引用
收藏
页数:15
相关论文
共 120 条
[1]  
Abrigo ABC, 2019, 2019 IEEE 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT), P46, DOI [10.1109/infoct.2019.8711432, 10.1109/INFOCT.2019.8711432]
[2]  
Achiam J., GPT-4 technical report
[3]  
Angraini N.A., 2024, Build. Info. Technol. Sci. (BITS), V6
[4]  
Anil Rohan, 2023, arXiv
[5]   Analysis of Political Sentiment Orientations on Twitter [J].
Ansari, Mohd Zeeshan ;
Aziz, M. B. ;
Siddiqui, M. O. ;
Mehra, H. ;
Singh, K. P. .
INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 :1821-1828
[6]  
Arviv E., 2022, 2022 C EMP METH NAT, P4206
[7]  
Atagun E., 2021, 2021 6 INT C COMP SC
[8]  
B, 2023, 2023 INT C INT SYST
[9]   What social media activities reveal about election results? The use of Facebook during the 2015 general election campaign in Croatia [J].
Babac, Marina Bagic ;
Podobnik, Vedran .
INFORMATION TECHNOLOGY & PEOPLE, 2018, 31 (02) :327-347
[10]   Feed-forward neural networks [J].
Bebis, George ;
Georgiopoulos, Michael .
IEEE Potentials, 1994, 13 (04) :27-31