Using early LLMs for corpus linguistics: Examining ChatGPT's potential and limitations

被引:7
作者
Uchida, Satoru [1 ]
机构
[1] Kyushu Univ, Fac Languages & Cultures, 744 Motooka,Nishi Ku, Fukuoka, Japan
来源
APPLIED CORPUS LINGUISTICS | 2024年 / 4卷 / 01期
关键词
LLM; ChatGPT; Corpus linguistics; Frequency list; Collocation;
D O I
10.1016/j.acorp.2024.100089
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This study evaluates the extent to which information can be obtained from early Large Language Models (LLMs) for corpus linguistic research. Various tasks were conducted using ChatGPT 3.5, such as generating word frequency lists, collocations, words that fit certain grammatical patterns, and identifying genres. The generations were then compared with the search results from a large-scale general corpus (COCA). While favorable results were not achieved in identifying the genres of words or paragraphs, there was notable congruence in the frequency lists (75.0 %), collocations (42.8 %), and grammatical patterns (53.0 %) for the top 20 items. Even when the generated items did not perfectly match those from COCA, it was evident that high-frequency items were produced. Although LLMs may not be sufficient for rigorous academic research, the results are adequate for discerning overall trends or assisting learners. In addition, the results of this study show that the ability to search at the phrase level is an advantage of using LLMs for corpus research.
引用
收藏
页数:9
相关论文
共 16 条
[1]   Developing the Academic Collocation List (ACL) - A corpus-driven and expert-judged approach [J].
Ackermann, Kirsten ;
Chen, Yu-Hua .
JOURNAL OF ENGLISH FOR ACADEMIC PURPOSES, 2013, 12 (04) :235-247
[2]  
Cai ZG, 2024, Arxiv, DOI [arXiv:2303.08014, DOI 10.48550/ARXIV.2303.08014]
[3]  
Conneau A., 2020, P 58 ANN M ASS COMP, DOI DOI 10.18653/V1/2020.ACL-MAIN.747
[4]   Generative AI and the end of corpus-assisted data-driven learning? Not so fast! [J].
Crosthwaite, Peter ;
Baisa, Vit .
APPLIED CORPUS LINGUISTICS, 2023, 3 (03)
[5]   Collocations in Corpus-Based Language Learning Research: Identifying, Comparing, and Interpreting the Evidence [J].
Gablasova, Dana ;
Brezina, Vaclav ;
McEnery, Tony .
LANGUAGE LEARNING, 2017, 67 :155-179
[6]  
Guo B., 2023, COMP CORPUS EVAL DET, DOI DOI 10.48550/ARXIV.2301.07597
[7]  
Jones C., 2022, The Routledge Handbook of Corpus Linguistics, P126
[8]  
Keirinkan, 2018, Vision quest English expression I advanced
[9]  
Kuzman T, 2023, Arxiv, DOI [arXiv:2303.03953, 10.48550/arXiv.2303.03953, DOI 10.48550/ARXIV.2303.03953]
[10]   ChatGPT: Friend or foe (to corpus linguists)? [J].
Lin, Phoebe .
APPLIED CORPUS LINGUISTICS, 2023, 3 (03)