Getting into bed with embeddings? A comparison of collocations and word embeddings for corpus-assisted discourse analysis

被引:0
作者
Batchelor, Jordan [1 ]
机构
[1] 1179 E Vermont Rd, Gilbert, AZ 85295 USA
来源
APPLIED CORPUS LINGUISTICS | 2024年 / 4卷 / 03期
关键词
Collocation analysis; Word embeddings; Corpus-assisted discourse studies; Discourse analysis;
D O I
10.1016/j.acorp.2024.100117
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This paper discusses two approaches for identifying lexical patterns in discourse, namely the corpus linguistic method of collocation analysis and the natural language processing method of word embeddings. While both approaches can identify lexical patterns, they approach the task with different underlying frameworks, and the extent to which their results resemble one another has not been directly compared. This study uses two corpora, five collocation measures, and two word embedding algorithms to generate such comparisons. Results generally support the notion that many word pairs with similar embeddings are collocates, and that, to a lesser extent, many collocates have similar word embeddings. However, a major difference is that word pairs with similar embeddings do not need to co-occur often, or at all. Moreover, systematic differences in the kinds of words highlighted between the two word embedding algorithms were found and are discussed.
引用
收藏
页数:9
相关论文
共 38 条
[11]  
Brezina Vaclav., 2018, STAT CORPUS LINGUIST
[12]   Fear and responsibility: discourses of obesity and risk in the UK press [J].
Brookes, Gavin ;
Baker, Paul .
JOURNAL OF RISK RESEARCH, 2022, 25 (03) :363-378
[13]   Considerations about learning Word2Vec [J].
Di Gennaro, Giovanni ;
Buonanno, Amedeo ;
Palmieri, Francesco A. N. .
JOURNAL OF SUPERCOMPUTING, 2021, 77 (11) :12320-12335
[14]  
Dunn J., 2022, Natural language processing for corpus linguistics
[15]  
Evert S, 2009, HANDB SPRACH KOMMUN, V29, P1212
[16]  
Gillings M., 2023, Corpus-Assisted Discourse Studies
[17]   The interpretation of topic models for scholarly analysis: An evaluation and critique of current practice [J].
Gillings, Mathew ;
Hardie, Andrew .
DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2023, 38 (02) :530-543
[18]  
Gries S.T., 2020, A practical handbook of corpus linguistics, P141
[19]  
Grimmer J., 2022, Text as data: A new framework for machine learning and the social sciences
[20]  
Harvey R., 2020, Journal of Corpora and Discourse Studies, V3, P31