Beyond rating scales: With targeted evaluation, large language models are poised for psychological assessment

被引：20

作者：

Kjell, Oscar N. E. ^{[1
,2
]}

Kjell, Katarina ^{[1
]}

Schwartz, H. Andrew ^{[1
,2
]}

机构：

[1] Lund Univ, Psychol Dept, Lund, Sweden

[2] SUNY Stony Brook Univ, Comp Sci Dept, Stony Brook, NY USA

来源：

PSYCHIATRY RESEARCH | 2024年 / 333卷

基金：

瑞典研究理事会;

关键词：

Large language models; Transformers; Artificial intelligence; Psychology; Assessment; ITEM RESPONSE THEORY; SOCIAL MEDIA; PSYCHIATRIC-DIAGNOSIS; WORDS; AI;

D O I：

10.1016/j.psychres.2023.115667

中图分类号：

R749 [精神病学];

学科分类号：

100205 ;

摘要：

In this narrative review, we survey recent empirical evaluations of AI-based language assessments and present a case for the technology of large language models to be poised for changing standardized psychological assessment. Artificial intelligence has been undergoing a purported "paradigm shift" initiated by new machine learning models, large language models (e.g., BERT, LAMMA, and that behind ChatGPT). These models have led to unprecedented accuracy over most computerized language processing tasks, from web searches to automatic machine translation and question answering, while their dialogue-based forms, like ChatGPT have captured the interest of over a million users. The success of the large language model is mostly attributed to its capability to numerically represent words in their context, long a weakness of previous attempts to automate psychological assessment from language. While potential applications for automated therapy are beginning to be studied on the heels of chatGPT's success, here we present evidence that suggests, with thorough validation of targeted deployment scenarios, that AI's newest technology can move mental health assessment away from rating scales and to instead use how people naturally communicate, in language.

引用

页数：12

共 50 条

[31] Performance evaluation of large language models in pediatric nephrology clinical decision support: a comprehensive assessment [J].

Niel, Olivier ;

Dookhun, Dishana ;

Caliment, Ancuta .

PEDIATRIC NEPHROLOGY, 2025,

[32] Empowerment of Large Language Models in Psychological Counseling through Prompt Engineering [J].

Huang, Shanshan ;

Fu, Fuxiang ;

Yang, Ke ;

Zhang, Ke ;

Yang, Fan .

2024 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE, SEAI 2024, 2024, :220-225

[33] Using large language models to facilitate academic work in the psychological sciences [J].

Sohail, Aamir ;

Zhang, Lei .

CURRENT PSYCHOLOGY, 2025, 44 (09) :7910-7918

[34] Beyond Binary Classification: Customizable Text Watermark on Large Language Models [J].

Xu, Zhenyu ;

Xu, Ruoyu ;

Sheng, Victor S. .

2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,

[35] Updating knowledge in Large Language Models: an Empirical Evaluation [J].

Marinelli, Alberto Roberto ;

Carta, Antonio ;

Passaro, Lucia C. .

IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS 2024, IEEE EAIS 2024, 2024, :289-296

[36] PromptBench: A Unified Library for Evaluation of Large Language Models [J].

Zhu, Kaijie ;

Zhao, Qinlin ;

Chen, Hao ;

Wang, Jindong ;

Xie, Xing .

JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 :1-22

[37] Is ChatGPT a Competent Teacher? Systematic Evaluation of Large Language Models on the Competency Model [J].

Gong, Liuying ;

Chen, Jingyuan ;

Wu, Fei .

IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2025, 18 :530-541

[38] Analyzing the Innovative Potential of Texts Generated by Large Language Models: An Empirical Evaluation [J].

Krauss, Oliver ;

Jungwirth, Michaela ;

Elflein, Marius ;

Sandler, Simone ;

Altenhofer, Christian ;

Stoeck, Andreas .

DATABASE AND EXPERT SYSTEMS APPLICATIONS - DEXA 2023 WORKSHOPS, 2023, 1872 :11-22

[39] Evaluation Framework of Large Language Models in Medical Documentation Development and Usability Study [J].

Seo, Junhyuk ;

Choi, Dasol ;

Kim, Taerim ;

Cha, Won Chul ;

Kim, Minha ;

Yoo, Haanju ;

Oh, Namkee ;

Yi, Yongjin ;

Lee, Kye Hwa ;

Choi, Edward .

JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26

[40] Large Language Models and Surgical Decision-Making: Evaluation of Generative Unimodal AI in Facial Traumatology Practice [J].

Benedetti, Simone ;

Frosolini, Andrea ;

Catarzi, Lisa ;

Vaira, Luigi Angelo ;

Consorti, Giuseppe ;

Paglianiti, Mariagrazia ;

Gennaro, Paolo ;

Gabriele, Guido .

JOURNAL OF MAXILLOFACIAL & ORAL SURGERY, 2025,

← 1 2 3 4 5 →