Graph-aware pre-trained language model for political sentiment analysis in Filipino social media

被引:0
作者
Aquino, Jean Aristide [1 ]
Liew, Di Jie [1 ]
Chang, Yung-Chun [1 ,2 ]
机构
[1] Taipei Med Univ, Grad Inst Data Sci, Taipei, Taiwan
[2] Taipei Med Univ Hosp, Clin Big Data Res Ctr, Taipei, Taiwan
关键词
Politics; Sentiment analysis; Social media analytics; Pre-trained language model; Graph convolution network; Topic modeling;
D O I
10.1016/j.engappai.2025.110317
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Elections are emotionally and sentimentally charged events that offer unique opportunities for analysis of sentiments not typically observed during non-election periods. Unlike recurring phenomena, elections are inherently singular events, with each election shaped by distinct political, social, and cultural contexts. In the digital age, social media has become a direct channel for politicians and political parties to engage with voters, making it a critical platform for sentiment analysis. However, challenges such as imbalanced datasets, the prevalence of noisy non-text elements (e.g., emojis, hashtags, user mentions), and the need for effective integration of graphbased learning remain significant hurdles in sentiment prediction. To address these challenges, we constructed an imbalanced dataset of 8035 manually annotated tweets and approximately 516,000 weakly labeled Filipino tweets related to the 2022 Philippine National Election. Leveraging these datasets, we designed a Bidirectional Encoder Representations from Transformers (BERT) and Graph Convolution Network (GCN) model, which uniquely incorporates emojis, hashtags, and user mentions as features to enhance semantic understanding. Differing from the prior literature that focused solely on textual data or discarded non-textual elements, our model integrates these features to achieve a robust performance that outperforms baseline models with a macrorecall score of 64.73% and a macro F1-score of 68.72% on the imbalanced dataset. Additionally, we introduce a topic modeling framework that combines BERT embeddings with Latent Dirichlet Allocation (LDA) and LogLikelihood Ratio (LLR) to yield more distinct topic clusters for deeper sentiment analysis. Our work therefore contributes two novel datasets in Filipino as well as methodologies that bridge sentiment prediction and analysis, and in so doing, provides valuable resources for future research.
引用
收藏
页数:15
相关论文
共 120 条
[21]   The emergence of social media data and sentiment analysis in election prediction [J].
Chauhan, Priyavrat ;
Sharma, Nonita ;
Sikka, Geeta .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (02) :2601-2627
[22]   Medical knowledge infused convolutional neural networks for cohort selection in clinical trials [J].
Chen, Chi-Jen ;
Warikoo, Neha ;
Chang, Yung-Chun ;
Chen, Jin-Hua ;
Hsu, Wen-Lian .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (11) :1227-1236
[23]  
Chen D.Z., 2022, P 29 INT C COMP LING
[24]  
Cruz JCB, 2019, Arxiv, DOI arXiv:1907.00409
[25]  
Chu CH, 2016, CONF TECHNOL APPL, P84, DOI 10.1109/TAAI.2016.7880169
[26]  
Clark K, 2020, Arxiv, DOI [arXiv:2003.10555, DOI 10.48550/ARXIV.2003.10555]
[27]  
Cosme C.J., 2024, P WORLD C INF SYST B
[28]   Machine learning sentiment analysis, COVID-19 news and stock market reactions [J].
Costola, Michele ;
Hinz, Oliver ;
Nofer, Michael ;
Pelizzon, Loriana .
RESEARCH IN INTERNATIONAL BUSINESS AND FINANCE, 2023, 64
[29]  
Dai JQ, 2021, Arxiv, DOI [arXiv:2104.04986, DOI 10.48550/ARXIV.2104.04986]
[30]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171