Do Large Language Models Show Decision Heuristics Similar to Humans? A Case Study Using GPT-3.5

被引:13
|
作者
Suri, Gaurav [1 ]
Slater, Lily R. [1 ]
Ziaee, Ali [1 ]
Nguyen, Morgan [1 ]
机构
[1] San Francisco State Univ, Dept Psychol, Mind Brain & Behav, 1600 Holloway Ave, San Francisco, CA 94132 USA
关键词
natural language processing; Large Language Models; ChatGPT; heuristics; PHYSICIANS; JUDGMENT; CHOICE;
D O I
10.1037/xge0001547
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
A Large Language Model (LLM) is an artificial intelligence system trained on vast amounts of natural language data, enabling it to generate human-like responses to written or spoken language input. Generative Pre-Trained Transformer (GPT)-3.5 is an example of an LLM that supports a conversational agent called ChatGPT. In this work, we used a series of novel prompts to determine whether ChatGPT shows heuristics and other context-sensitive responses. We also tested the same prompts on human participants. Across four studies, we found that ChatGPT was influenced by random anchors in making estimates (anchoring, Study 1); it judged the likelihood of two events occurring together to be higher than the likelihood of either event occurring alone, and it was influenced by anecdotal information (representativeness and availability heuristic, Study 2); it found an item to be more efficacious when its features were presented positively rather than negatively-even though both presentations contained statistically equivalent information (framing effect, Study 3); and it valued an owned item more than a newly found item even though the two items were objectively identical (endowment effect, Study 4). In each study, human participants showed similar effects. Heuristics and context-sensitive responses in humans are thought to be driven by cognitive and affective processes such as loss aversion and effort reduction. The fact that an LLM-which lacks these processes-also shows such responses invites consideration of the possibility that language is sufficiently rich to carry these effects and may play a role in generating these effects in humans.
引用
收藏
页码:1066 / 1075
页数:10
相关论文
共 16 条
  • [1] How do large language models answer breast cancer quiz questions? A comparative study of GPT-3.5, GPT-4 and Google Gemini
    Irmici, Giovanni
    Cozzi, Andrea
    Della Pepa, Gianmarco
    De Berardinis, Claudia
    D'Ascoli, Elisa
    Cellina, Michaela
    Ce, Maurizio
    Depretto, Catherine
    Scaperrotta, Gianfranco
    RADIOLOGIA MEDICA, 2024, 129 (10): : 1463 - 1467
  • [2] Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
    Farhat, Faiza
    Chaudhry, Beenish Moalla
    Nadeem, Mohammad
    Sohail, Shahab Saquib
    Madsen, Dag Oivind
    JMIR MEDICAL EDUCATION, 2024, 10
  • [4] Evaluating the GPT-3.5 and GPT-4 Large Language Models for Zero-Shot Classification of South African Violent Event Data
    Kotze, Eduan
    Senekal, Burgert A.
    2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS, ICABCD 2024, 2024,
  • [5] Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
    Srinivasan, Nitin
    Samaan, Jamil S.
    Rajeev, Nithya D.
    Kanu, Mmerobasi U.
    Yeo, Yee Hui
    Samakar, Kamran
    SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (05): : 2522 - 2532
  • [6] Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
    Nitin Srinivasan
    Jamil S. Samaan
    Nithya D. Rajeev
    Mmerobasi U. Kanu
    Yee Hui Yeo
    Kamran Samakar
    Surgical Endoscopy, 2024, 38 : 2522 - 2532
  • [7] Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study
    Sönmez Saglam
    Veysel Uludag
    Zekeriya Okan Karaduman
    Mehmet Arıcan
    Mücahid Osman Yücel
    Raşit Emin Dalaslan
    BMC Medical Informatics and Decision Making, 25 (1)
  • [8] Evaluating large language models for surgical chart review of second stage implant-based breast reconstruction: a comparative analysis of manual review, GPT-3.5 Turbo, and GPT-4 Turbo
    Lakhlani, Devi
    Dadhania, Dhruv
    Nazerali, Rahim
    EUROPEAN JOURNAL OF PLASTIC SURGERY, 2025, 48 (01)
  • [9] Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing
    Kumari, Amita
    Kumari, Anita
    Singh, Amita
    Singh, Sanjeet K.
    Juhi, Ayesha
    Dhanvijay, Anup Kumar D.
    Pinjar, Mohammed Jaffer
    Mondal, Himel
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
  • [10] A Study Case of Automatic Archival Research and Compilation using Large Language Models
    Guo, Dongsheng
    Yue, Aizhen
    Ning, Fanggang
    Huang, Dengrong
    Chang, Bingxin
    Duan, Qiang
    Zhang, Lianchao
    Chen, Zhaoliang
    Zhang, Zheng
    Zhan, Enhao
    Zhang, Qilai
    Jiang, Kai
    Li, Rui
    Zhao, Shaoxiang
    Wei, Zizhong
    2023 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH, ICKG, 2023, : 52 - 59