Balanced and Explainable Social Media Analysis for Public Health with Large Language Models

被引:2
作者
Jiang, Yan [1 ]
Qiu, Ruihong [1 ]
Zhang, Yi [1 ]
Zhang, Peng-Fei [1 ]
机构
[1] Univ Queensland, Brisbane, Qld, Australia
来源
DATABASES THEORY AND APPLICATIONS, ADC 2023 | 2024年 / 14386卷
关键词
Public Health; Social Media; Text Classification;
D O I
10.1007/978-3-031-47843-7_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As social media becomes increasingly popular, more and more public health activities emerge, which is worth noting for pandemic monitoring and government decision-making. Current techniques for public health analysis involve popular models such as BERT and large language models (LLMs). Although recent progress in LLMs has shown a strong ability to comprehend knowledge by being fine-tuned on specific domain datasets, the costs of training an in-domain LLM for every specific public health task are especially expensive. Furthermore, such kinds of in-domain datasets from social media are generally highly imbalanced, which will hinder the efficiency of LLMs tuning. To tackle these challenges, the data imbalance issue can be overcome by sophisticated data augmentation methods for social media datasets. In addition, the ability of the LLMs can be effectively utilised by prompting the model properly. In light of the above discussion, in this paper, a novel ALEX framework is proposed for social media analysis on public health. Specifically, an augmentation pipeline is developed to resolve the data imbalance issue. Furthermore, an LLMs explanation mechanism is proposed by prompting an LLM with the predicted results from BERT models. Extensive experiments conducted on three tasks at the Social MediaMining for Health 2023 (SMM4H) competition with the first ranking in two tasks demonstrate the superior performance of the proposed ALEX method. Our code has been released in https://github.com/YanJiangJerry/ALEX.
引用
收藏
页码:73 / 86
页数:14
相关论文
共 45 条
  • [1] Abdullatif M., 2020, SEMEVAL
  • [2] Influence of Social Media Platforms on Public Health Protection Against the COVID-19 Pandemic via the Mediating Effects of Public Health Awareness and Behavioral Changes: Integrated Model
    Al-Dmour, Hani
    Masa'deh, Ra'ed
    Salman, Amer
    Abuhashesh, Mohammad
    Al-Dmour, Rand
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (08)
  • [3] Al-Garadi M.A., 2022, Healthcare.
  • [4] [Anonymous], 2018, Improving language understanding by generative pre-training
  • [5] Bacelar-Nicolau L., 2019, PDH
  • [6] Brown T.B., 2020, CoRR, V4165
  • [7] Nguyen DQ, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, P9
  • [8] Devlin J., 2018, CoRR, V4805
  • [9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [10] Ge H., 2021, BDAI