Can large language models help augment English psycholinguistic datasets?

被引：10

作者：

Trott, Sean ^{[1
]}

机构：

[1] Univ Calif San Diego, Dept Cognit Sci, 9500 Gilman Dr, La Jolla, CA 92093 USA

来源：

BEHAVIOR RESEARCH METHODS | 2024年 / 56卷 / 06期

关键词：

Dataset; Psycholinguistic resource; Large language models; ChatGPT; NORMS; RATINGS; ACQUISITION; AGE; CONCRETENESS; ICONICITY;

D O I：

10.3758/s13428-024-02337-z

中图分类号：

B841 [心理学研究方法];

学科分类号：

040201 ;

摘要：

Research on language and cognition relies extensively on psycholinguistic datasets or "norms". These datasets contain judgments of lexical properties like concreteness and age of acquisition, and can be used to norm experimental stimuli, discover empirical relationships in the lexicon, and stress-test computational models. However, collecting human judgments at scale is both time-consuming and expensive. This issue of scale is compounded for multi-dimensional norms and those incorporating context. The current work asks whether large language models (LLMs) can be leveraged to augment the creation of large, psycholinguistic datasets in English. I use GPT-4 to collect multiple kinds of semantic judgments (e.g., word similarity, contextualized sensorimotor associations, iconicity) for English words and compare these judgments against the human "gold standard". For each dataset, I find that GPT-4's judgments are positively correlated with human judgments, in some cases rivaling or even exceeding the average inter-annotator agreement displayed by humans. I then identify several ways in which LLM-generated norms differ from human-generated norms systematically. I also perform several "substitution analyses", which demonstrate that replacing human-generated norms with LLM-generated norms in a statistical model does not change the sign of parameter estimates (though in select cases, there are significant changes to their magnitude). I conclude by discussing the considerations and limitations associated with LLM-generated norms in general, including concerns of data contamination, the choice of LLM, external validity, construct validity, and data quality. Additionally, all of GPT-4's judgments (over 30,000 in total) are made available online for further analysis.

引用

页码：6082 / 6100

页数：19

共 50 条

[21] Frontiers: Can Large Language Models Capture Human Preferences? [J].

Goli, Ali ;

Singh, Amandeep .

MARKETING SCIENCE, 2024, 43 (04) :709-722

[22] Can Large Language Models Automatically Generate GIS Reports? [J].

Starace, Luigi Libero Lucio ;

Di Martino, Sergio .

WEB AND WIRELESS GEOGRAPHICAL INFORMATION SYSTEMS, W2GIS 2024, 2024, 14673 :147-161

[23] Can Large Language Models Serve as Reliable Tools for Information in Dentistry? A Systematic Review [J].

Alhazmi, Nora ;

Alshehri, Aram ;

Bahammam, Fahad ;

Philip, Manju ;

Nadeem, Muhammad ;

Khanagar, Sanjeev .

INTERNATIONAL DENTAL JOURNAL, 2025, 75 (04)

[24] Can large language models reason about medical questions? [J].

Lievin, Valentin ;

Hother, Christoffer Egeberg ;

Motzfeldt, Andreas Geert ;

Winther, Ole .

PATTERNS, 2024, 5 (03)

[25] Can Large Language Models be Anomaly Detectors for Time Series? [J].

Alnegheimish, Sarah ;

Nguyen, Linh ;

Berti-Equille, Laure ;

Veeramachaneni, Kalyan .

2024 IEEE 11TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, DSAA 2024, 2024, :218-227

[26] How understanding large language models can inform the use of ChatGPT in physics education [J].

Polverini, Giulia ;

Gregorcic, Bor .

EUROPEAN JOURNAL OF PHYSICS, 2024, 45 (02)

[27] Large Language Models and Sentiment Analysis in Financial Markets: A Review, Datasets, and Case Study [J].

Liu, Chenghao ;

Arulappan, Arunkumar ;

Naha, Ranesh ;

Mahanti, Aniket ;

Kamruzzaman, Joarder ;

Ra, In-Ho .

IEEE ACCESS, 2024, 12 :134041-134061

[28] A Generalize Hardware Debugging Approach for Large Language Models Semi-Synthetic, Datasets [J].

Fu, Weimin ;

Li, Shijie ;

Zhao, Yifang ;

Yang, Kaichen ;

Zhang, Xuan ;

Jin, Yier ;

Guo, Xiaolong .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2025, 72 (02) :623-636

[29] Optimizing Chinese-to-English Translation Using Large Language Models [J].

Huang, Donghao ;

Wang, Zhaoxia .

2025 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN NATURAL LANGUAGE PROCESSING AND SOCIAL MEDIA, CI-NLPSOME, 2025,

[30] Comuniqa : Exploring Large Language Models for Improving English Speaking Skills [J].

Mhasakar, Manas ;

Sharma, Shikhar ;

Mehra, Apurv ;

Venaik, Utkarsh ;

Singhal, Ujjwal ;

Kumar, Dhruv ;

Mittal, Kashish .

PROCEEDINGS OF THE ACM SIGCAS/SIGCHI CONFERENCE ON COMPUTING AND SUSTAINABLE SOCIETIES 2024, COMPASS 2024, 2024, :256-267

← 1 2 3 4 5 →