The Efficacy of Large Language Models and Crowd Annotation for Accurate Content Analysis of Political Social Media Messages

被引:0
作者
Stromer-Galley, Jennifer [1 ,2 ]
McKernan, Brian [3 ]
Zaman, Saklain [2 ]
Maganur, Chinmay [1 ]
Regmi, Sampada [1 ]
机构
[1] Syracuse Univ, Syracuse, NY USA
[2] Syracuse Univ, Sch Informat Studies, 343 Hinds Hall, Syracuse, NY 13244 USA
[3] Pace Univ, Dept Commun & Media Studies, New York, NY USA
关键词
large language models; artificial intelligence; crowdsourcing; content analysis; social media; machine learning; CANDIDATES; TWITTER;
D O I
10.1177/08944393251334977
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Systematic content analysis of messaging has been a staple method in the study of communication. While computer-assisted content analysis has been used in the field for three decades, advances in machine learning and crowd-based annotation combined with the ease of collecting volumes of text-based communication via social media have made the opportunities for classification of messages easier and faster. The greatest advancement yet might be in the form of general intelligence large language models (LLMs), which are ostensibly able to accurately and reliably classify messages by leveraging context to disambiguate meaning. It is unclear, however, how effective LLMs are in deploying the method of content analysis. In this study, we compare the classification of political candidate social media messages between trained annotators, crowd annotators, and large language models from Open AI accessed through the free Web (ChatGPT) and the paid API (GPT API) on five different categories of political communication commonly used in the literature. We find that crowd annotation generally had higher F1 scores than ChatGPT and an earlier version of the GPT API, although the newest version, GPT-4 API, demonstrated good performance as compared with the crowd and with ground truth data derived from trained student annotators. This study suggests the application of any LLM to an annotation task requires validation, and that freely available and older LLM models may not be effective for studying human communication.
引用
收藏
页数:22
相关论文
共 67 条
[1]   Can Generative AI improve social science? [J].
Bail, Christopher A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (21)
[2]   Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data [J].
Benoit, Kenneth ;
Conway, Drew ;
Lauderdale, Benjamin E. ;
Laver, Michael ;
Mikhaylov, Slava .
AMERICAN POLITICAL SCIENCE REVIEW, 2016, 110 (02) :278-295
[3]  
BERELSON B, 1952, CONTENT ANAL COMMUNI
[4]  
Brown TB, 2020, ADV NEUR IN, V33
[5]   Better Crowdcoding: Strategies for Promoting Accuracy in Crowdsourced Content Analysis [J].
Budak, Ceren ;
Garrett, R. Kelly ;
Sude, Daniel .
COMMUNICATION METHODS AND MEASURES, 2021, 15 (02) :141-155
[6]  
Chen LJ, 2023, Arxiv, DOI arXiv:2307.09009
[7]   Real Estate Insights Unleashing the potential of ChatGPT in property valuation reports: the "Red Book" compliance Chain-of-thought (CoT) prompt engineering [J].
Cheung, Ka Shing .
JOURNAL OF PROPERTY INVESTMENT & FINANCE, 2024, 42 (02) :200-206
[8]  
Chowdhery A, 2023, J MACH LEARN RES, V24
[9]   Twitter Use by Presidential Primary Candidates During the 2012 Campaign [J].
Conway, Bethany Anne ;
Kenski, Kate ;
Wang, Di .
AMERICAN BEHAVIORAL SCIENTIST, 2013, 57 (11) :1596-1610
[10]  
Crawford Kate., 2021, The Atlas of AI