Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment

被引:11
作者
Kjell, Oscar N. E. [1 ,2 ]
Kjell, Katarina [1 ]
Schwartz, H. Andrew [1 ,2 ]
机构
[1] Lund Univ, Psychol Dept, Lund, Sweden
[2] SUNY Stony Brook Univ, Comp Sci Dept, Stony Brook, NY USA
基金
瑞典研究理事会;
关键词
Large language models; Transformers; Artificial intelligence; Psychology; Assessment; ITEM RESPONSE THEORY; SOCIAL MEDIA; PSYCHIATRIC-DIAGNOSIS; WORDS; AI;
D O I
10.1016/j.psychres.2023.115667
中图分类号
R749 [精神病学];
学科分类号
100205 ;
摘要
In this narrative review, we survey recent empirical evaluations of AI-based language assessments and present a case for the technology of large language models to be poised for changing standardized psychological assessment. Artificial intelligence has been undergoing a purported "paradigm shift" initiated by new machine learning models, large language models (e.g., BERT, LAMMA, and that behind ChatGPT). These models have led to unprecedented accuracy over most computerized language processing tasks, from web searches to automatic machine translation and question answering, while their dialogue-based forms, like ChatGPT have captured the interest of over a million users. The success of the large language model is mostly attributed to its capability to numerically represent words in their context, long a weakness of previous attempts to automate psychological assessment from language. While potential applications for automated therapy are beginning to be studied on the heels of chatGPT's success, here we present evidence that suggests, with thorough validation of targeted deployment scenarios, that AI's newest technology can move mental health assessment away from rating scales and to instead use how people naturally communicate, in language.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Updating knowledge in Large Language Models: an Empirical Evaluation
    Marinelli, Alberto Roberto
    Carta, Antonio
    Passaro, Lucia C.
    IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS 2024, IEEE EAIS 2024, 2024, : 289 - 296
  • [22] PromptBench: A Unified Library for Evaluation of Large Language Models
    Zhu, Kaijie
    Zhao, Qinlin
    Chen, Hao
    Wang, Jindong
    Xie, Xing
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 22
  • [23] Analyzing the Innovative Potential of Texts Generated by Large Language Models: An Empirical Evaluation
    Krauss, Oliver
    Jungwirth, Michaela
    Elflein, Marius
    Sandler, Simone
    Altenhofer, Christian
    Stoeck, Andreas
    DATABASE AND EXPERT SYSTEMS APPLICATIONS - DEXA 2023 WORKSHOPS, 2023, 1872 : 11 - 22
  • [24] Large language models for overcoming language barriers in obstetric anaesthesia: a structured assessment
    Lomas, A.
    Broom, M. A.
    INTERNATIONAL JOURNAL OF OBSTETRIC ANESTHESIA, 2024, 60
  • [25] Large language models leverage external knowledge to extend clinical insight beyond language boundaries
    Wu, Jiageng
    Wu, Xian
    Qiu, Zhaopeng
    Li, Minghui
    Lin, Shixu
    Zhang, Yingying
    Zheng, Yefeng
    Yuan, Changzheng
    Yang, Jie
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2054 - 2064
  • [26] Variability in Large Language Models' Responses to Medical Licensing and Certification Examinations. Comment on "How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment"
    Epstein, Richard H.
    Dexter, Franklin
    JMIR MEDICAL EDUCATION, 2023, 9
  • [27] Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
    Gumilar, Khanisyah Erza
    Indraprasta, Birama R.
    Faridzi, Ach Salman
    Wibowo, Bagus M.
    Herlambang, Aditya
    Rahestyningtyas, Eccita
    Irawan, Budi
    Tambunan, Zulkarnain
    Bustomi, Ahmad Fadhli
    Brahmantara, Bagus Ngurah
    Yu, Zih-Ying
    Hsu, Yu-Cheng
    Pramuditya, Herlangga
    Putra, Very Great E.
    Nugroho, Hari
    Mulawardhana, Pungky
    Tjokroprawiro, Brahmana A.
    Hedianto, Tri
    Ibrahim, Ibrahim H.
    Huang, Jingshan
    Lij, Dongqi
    Lu, Chien-Hsing
    Yang, Jer-Yen
    Liao, Li-Na
    Tan, Ming
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 23 : 4019 - 4026
  • [28] Automated Scoring of Constructed Response Items in Math Assessment Using Large Language Models
    Morris, Wesley
    Holmes, Langdon
    Choi, Joon Suh
    Crossley, Scott
    INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2024, : 559 - 586
  • [29] Academic Integrity considerations of AI Large Language Models in the post-pandemic era: ChatGPT and beyond
    Perkins, Mike
    JOURNAL OF UNIVERSITY TEACHING AND LEARNING PRACTICE, 2023, 20 (02)
  • [30] How good are large language models at product risk assessment?
    Collier, Zachary A.
    Gruss, Richard J.
    Abrahams, Alan S.
    RISK ANALYSIS, 2024, : 766 - 789