The Efficacy of Large Language Models and Crowd Annotation for Accurate Content Analysis of Political Social Media Messages

被引:0
作者
Stromer-Galley, Jennifer [1 ,2 ]
McKernan, Brian [3 ]
Zaman, Saklain [2 ]
Maganur, Chinmay [1 ]
Regmi, Sampada [1 ]
机构
[1] Syracuse Univ, Syracuse, NY USA
[2] Syracuse Univ, Sch Informat Studies, 343 Hinds Hall, Syracuse, NY 13244 USA
[3] Pace Univ, Dept Commun & Media Studies, New York, NY USA
关键词
large language models; artificial intelligence; crowdsourcing; content analysis; social media; machine learning; CANDIDATES; TWITTER;
D O I
10.1177/08944393251334977
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Systematic content analysis of messaging has been a staple method in the study of communication. While computer-assisted content analysis has been used in the field for three decades, advances in machine learning and crowd-based annotation combined with the ease of collecting volumes of text-based communication via social media have made the opportunities for classification of messages easier and faster. The greatest advancement yet might be in the form of general intelligence large language models (LLMs), which are ostensibly able to accurately and reliably classify messages by leveraging context to disambiguate meaning. It is unclear, however, how effective LLMs are in deploying the method of content analysis. In this study, we compare the classification of political candidate social media messages between trained annotators, crowd annotators, and large language models from Open AI accessed through the free Web (ChatGPT) and the paid API (GPT API) on five different categories of political communication commonly used in the literature. We find that crowd annotation generally had higher F1 scores than ChatGPT and an earlier version of the GPT API, although the newest version, GPT-4 API, demonstrated good performance as compared with the crowd and with ground truth data derived from trained student annotators. This study suggests the application of any LLM to an annotation task requires validation, and that freely available and older LLM models may not be effective for studying human communication.
引用
收藏
页数:22
相关论文
共 67 条
[11]  
Dhurandhar A, 2024, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, P2431
[12]   Impact of Annotator Demographics on Sentiment Dataset Labeling [J].
Ding Y. ;
You J. ;
Machulla T.-K. ;
Jacobs J. ;
Sen P. ;
Höllerer T. .
Proceedings of the ACM on Human-Computer Interaction, 2022, 6 (CSCW2)
[13]   Twitter Style: An Analysis of How House Candidates Used Twitter in Their 2012 Campaigns [J].
Evans, Heather K. ;
Cordova, Victoria ;
Sipole, Savannah .
PS-POLITICAL SCIENCE & POLITICS, 2014, 47 (02) :454-462
[14]  
Geer JohnG., 2006, DEFENSE NEGATIVITY
[15]   ChatGPT outperforms crowd workers for text-annotation tasks [J].
Gilardi, Fabrizio ;
Alizadeh, Meysam ;
Kubli, Mael .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (30)
[16]  
Goel A, 2023, PR MACH LEARN RES, V225, P82
[17]   Accurate, Fast, But Not Always Cheap: Evaluating "Crowdcoding" as an Alternative Approach to Analyze Social Media Data [J].
Guo, Lei ;
Mays, Kate ;
Lai, Sha ;
Jalal, Mona ;
Ishwar, Prakash ;
Betke, Margrit .
JOURNALISM & MASS COMMUNICATION QUARTERLY, 2020, 97 (03) :811-834
[18]  
Ham D, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P583
[19]  
Hart R., 2000, Diction 5.0 the text analysis program
[20]   Sentiment analysis of political communication: combining a dictionary approach with crowdcoding [J].
Haselmayer, Martin ;
Jenny, Marcelo .
QUALITY & QUANTITY, 2017, 51 (06) :2623-2646