Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding

被引:56
作者
Xiao, Ziang [1 ]
Yuan, Xingdi [1 ]
Liao, Q. Vera [1 ]
Abdelghani, Rania [2 ]
Oudeyer, Pierre-Yves [2 ]
机构
[1] Microsoft Res, Montreal, PQ, Canada
[2] INRIA, Paris, France
来源
COMPANION PROCEEDINGS OF 2023 28TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2023 COMPANION | 2023年
关键词
Qualitative Analysis; Deductive Coding; Large Language Model; GPT-3;
D O I
10.1145/3581754.3584136
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-specific models. In this study, we explored the use of large language models (LLMs) in supporting deductive coding, a major category of qualitative analysis where researchers use pre-determined code-books to label the data into a fixed set of codes. Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning. Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results. We lay out challenges and opportunities in using LLMs to support qualitative coding and beyond.
引用
收藏
页码:75 / 78
页数:4
相关论文
共 16 条
[1]   Conversational agents for fostering curiosity-driven learning in children [J].
Abdelghani, Rania ;
Oudeyer, Pierre-Yves ;
Law, Edith ;
de Vulpillieres, Catherine ;
Sauzeon, Helene .
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2022, 167
[2]  
Brown T, 2020, Adv Neural Inf Process Syst, V33, P1877
[3]  
Chowdhery A, 2022, Arxiv, DOI arXiv:2204.02311
[4]  
Gao TY, 2021, 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, P3816
[5]   Three approaches to qualitative content analysis [J].
Hsieh, HF ;
Shannon, SE .
QUALITATIVE HEALTH RESEARCH, 2005, 15 (09) :1277-1288
[6]   Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery [J].
Korngiebel, Diane M. ;
Mooney, Sean D. .
NPJ DIGITAL MEDICINE, 2021, 4 (01)
[7]  
Liew J., 2014, P ACL 2014 WORKSH L, P44
[8]  
Liu P., 2021, arXiv
[9]   Interrater reliability: the kappa statistic [J].
McHugh, Mary L. .
BIOCHEMIA MEDICA, 2012, 22 (03) :276-282
[10]  
Muller Michael, 2016, P 2016 ACM INT C SUP, P3, DOI [DOI 10.1145/2957276.2957280, 10.1145/2957276.2957280]