Can large language models help augment English psycholinguistic datasets?

被引:10
作者
Trott, Sean [1 ]
机构
[1] Univ Calif San Diego, Dept Cognit Sci, 9500 Gilman Dr, La Jolla, CA 92093 USA
关键词
Dataset; Psycholinguistic resource; Large language models; ChatGPT; NORMS; RATINGS; ACQUISITION; AGE; CONCRETENESS; ICONICITY;
D O I
10.3758/s13428-024-02337-z
中图分类号
B841 [心理学研究方法];
学科分类号
040201 ;
摘要
Research on language and cognition relies extensively on psycholinguistic datasets or "norms". These datasets contain judgments of lexical properties like concreteness and age of acquisition, and can be used to norm experimental stimuli, discover empirical relationships in the lexicon, and stress-test computational models. However, collecting human judgments at scale is both time-consuming and expensive. This issue of scale is compounded for multi-dimensional norms and those incorporating context. The current work asks whether large language models (LLMs) can be leveraged to augment the creation of large, psycholinguistic datasets in English. I use GPT-4 to collect multiple kinds of semantic judgments (e.g., word similarity, contextualized sensorimotor associations, iconicity) for English words and compare these judgments against the human "gold standard". For each dataset, I find that GPT-4's judgments are positively correlated with human judgments, in some cases rivaling or even exceeding the average inter-annotator agreement displayed by humans. I then identify several ways in which LLM-generated norms differ from human-generated norms systematically. I also perform several "substitution analyses", which demonstrate that replacing human-generated norms with LLM-generated norms in a statistical model does not change the sign of parameter estimates (though in select cases, there are significant changes to their magnitude). I conclude by discussing the considerations and limitations associated with LLM-generated norms in general, including concerns of data contamination, the choice of LLM, external validity, construct validity, and data quality. Additionally, all of GPT-4's judgments (over 30,000 in total) are made available online for further analysis.
引用
收藏
页码:6082 / 6100
页数:19
相关论文
共 50 条
[41]   Modeling English vocabulary acquisition through the biomechanics of speech and Large Language Models [J].
Shang, Jingya .
MCB Molecular and Cellular Biomechanics, 2025, 22 (01)
[42]   Applied Hedge Algebra Approach with Multilingual Large Language Models to Extract Hidden Rules in Datasets for Improvement of Generative AI Applications [J].
Pham, Hai Van ;
Moore, Philip .
INFORMATION, 2024, 15 (07)
[43]   Large language models help facilitate the automated synthesis of information on potential pest controllers [J].
Scheepens, Daan ;
Millard, Joseph ;
Farrell, Maxwell ;
Newbold, Tim .
METHODS IN ECOLOGY AND EVOLUTION, 2024, 15 (07) :1261-1273
[44]   Large language models without grounding recover non-sensorimotor but not sensorimotor features of human concepts [J].
Xu, Qihui ;
Peng, Yingying ;
Nastase, Samuel A. ;
Chodorow, Martin ;
Wu, Minghua ;
Li, Ping .
NATURE HUMAN BEHAVIOUR, 2025,
[45]   Large language models and automated essay scoring of English language learner writing: Insights into validity and reliability [J].
Pack A. ;
Barrett A. ;
Escalante J. .
Computers and Education: Artificial Intelligence, 2024, 6
[46]   Can large language models detect drug-drug interactions leading to adverse drug reactions? [J].
Sicard, Justine ;
Montastruc, Francois ;
Achalme, Coline ;
Jonville-Bera, Annie Pierre ;
Songue, Paul ;
Babin, Marina ;
Soeiro, Thomas ;
Schiro, Pauline ;
de Canecaude, Claire ;
Barus, Romain .
THERAPEUTIC ADVANCES IN DRUG SAFETY, 2025, 16
[47]   How Artificial Intelligence Can Influence Elections: Analyzing the Large Language Models (LLMs) Political Bias [J].
Rotaru, George-Cristinel ;
Anagnoste, Sorin ;
Oancea, Vasile-Marian .
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BUSINESS EXCELLENCE, 2024, 18 (01) :1882-1891
[48]   Can Large Language Models Discover Metamorphic Relations? A Large-Scale Empirical Study [J].
Zhang, Jiaming ;
Sun, Chang-ai ;
Liu, Huai ;
Dong, Sijin .
2025 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2025, :24-35
[49]   Constructing synthetic datasets with generative artificial intelligence to train large language models to classify acute renal failure from clinical notes [J].
Litake, Onkar ;
Park, Brian H. ;
Tully, Jeffrey L. ;
Gabriel, Rodney A. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (06) :1404-1410
[50]   Large language models: a primer and gastroenterology applications [J].
Shahab, Omer ;
El Kurdi, Bara ;
Shaukat, Aasma ;
Nadkarni, Girish ;
Soroush, Ali .
THERAPEUTIC ADVANCES IN GASTROENTEROLOGY, 2024, 17