Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment

被引：20

作者：

Kjell, Oscar N. E. ^{[1
,2
]}

Kjell, Katarina ^{[1
]}

Schwartz, H. Andrew ^{[1
,2
]}

机构：

[1] Lund Univ, Psychol Dept, Lund, Sweden

[2] SUNY Stony Brook Univ, Comp Sci Dept, Stony Brook, NY USA

来源：

PSYCHIATRY RESEARCH | 2024年 / 333卷

基金：

瑞典研究理事会;

关键词：

Large language models; Transformers; Artificial intelligence; Psychology; Assessment; ITEM RESPONSE THEORY; SOCIAL MEDIA; PSYCHIATRIC-DIAGNOSIS; WORDS; AI;

D O I：

10.1016/j.psychres.2023.115667

中图分类号：

R749 [精神病学];

学科分类号：

100205 ;

摘要：

In this narrative review, we survey recent empirical evaluations of AI-based language assessments and present a case for the technology of large language models to be poised for changing standardized psychological assessment. Artificial intelligence has been undergoing a purported "paradigm shift" initiated by new machine learning models, large language models (e.g., BERT, LAMMA, and that behind ChatGPT). These models have led to unprecedented accuracy over most computerized language processing tasks, from web searches to automatic machine translation and question answering, while their dialogue-based forms, like ChatGPT have captured the interest of over a million users. The success of the large language model is mostly attributed to its capability to numerically represent words in their context, long a weakness of previous attempts to automate psychological assessment from language. While potential applications for automated therapy are beginning to be studied on the heels of chatGPT's success, here we present evidence that suggests, with thorough validation of targeted deployment scenarios, that AI's newest technology can move mental health assessment away from rating scales and to instead use how people naturally communicate, in language.

引用

页数：12

共 50 条

[21] A Brief Review on Benchmarking for Large Language Models Evaluation in Healthcare [J].

Budler, Leona Cilar ;

Chen, Hongyu ;

Chen, Aokun ;

Topaz, Maxim ;

Tam, Wilson ;

Bian, Jiang ;

Stiglic, Gregor .

WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2025, 15 (02)

[22] Large Language Models in Biochemistry Education: Comparative Evaluation of Performance [J].

Bolgova, Olena ;

Shypilova, Inna ;

Mavrych, Volodymyr .

JMIR MEDICAL EDUCATION, 2025, 11

[23] Application of Large Language Models in Medical Training Evaluation-Using ChatGPT as a Standardized Patient: Multimetric Assessment [J].

Wang, Chenxu ;

Li, Shuhan ;

Lin, Nuoxi ;

Zhang, Xinyu ;

Han, Ying ;

Wang, Xiandi ;

Liu, Di ;

Tan, Xiaomei ;

Pu, Dan ;

Li, Kang ;

Qian, Guangwu ;

Yin, Rong .

JOURNAL OF MEDICAL INTERNET RESEARCH, 2025, 27

[24] Performance Evaluation and Implications of Large Language Models in Radiology Board Exams: Prospective Comparative Analysis [J].

Wei, Boxiong .

JMIR MEDICAL EDUCATION, 2025, 11

[25] Beyond Accuracy and Robustness Metrics for Large Language Models for Code [J].

Rodriguez-Cardenas, Daniel .

2024 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS, ICSE-COMPANION 2024, 2024, :159-161

[26] Evaluation of large language models for providing educational information in orthokeratology care [J].

Huang, Yangyi ;

Shi, Runhan ;

Chen, Can ;

Zhou, Xueyi ;

Zhou, Xingtao ;

Hong, Jiaxu ;

Chen, Zhi .

CONTACT LENS & ANTERIOR EYE, 2025, 48 (03)

[27] Facial Analysis for Plastic Surgery in the Era of Artificial Intelligence: A Comparative Evaluation of Multimodal Large Language Models [J].

Haider, Syed Ali ;

Prabha, Srinivasagam ;

Gomez-Cabello, Cesar A. ;

Borna, Sahar ;

Genovese, Ariana ;

Trabilsy, Maissa ;

Elegbede, Adekunle ;

Yang, Jenny Fei ;

Galvao, Andrea ;

Tao, Cui ;

Forte, Antonio Jorge .

JOURNAL OF CLINICAL MEDICINE, 2025, 14 (10)

[28] Assessing the accuracy and consistency of large language models in triaging social media posts for psychological distress [J].

Settanni, Michele ;

Quilghini, Francesco ;

Toscano, Anna ;

Marengo, Davide .

PSYCHIATRY RESEARCH, 2025, 351

[29] Assessment of Large Language Models in Cataract Care Information Provision: A Quantitative Comparison [J].

Su, Zichang ;

Jin, Kai ;

Wu, Hongkang ;

Luo, Ziyao ;

Grzybowski, Andrzej ;

Ye, Juan .

OPHTHALMOLOGY AND THERAPY, 2025, 14 (01) :103-116

[30] A Study on Prompt Types for Harmlessness Assessment of Large-Scale Language Models [J].

Shin, Yejin ;

Kim, Song-yi ;

Byun, Eun Young .

HCI INTERNATIONAL 2024 POSTERS, PT VII, HCII 2024, 2024, 2120 :228-233

← 1 2 3 4 5 →