Getting into bed with embeddings? A comparison of collocations and word embeddings for corpus-assisted discourse analysis

被引:0
作者
Batchelor, Jordan [1 ]
机构
[1] 1179 E Vermont Rd, Gilbert, AZ 85295 USA
来源
APPLIED CORPUS LINGUISTICS | 2024年 / 4卷 / 03期
关键词
Collocation analysis; Word embeddings; Corpus-assisted discourse studies; Discourse analysis;
D O I
10.1016/j.acorp.2024.100117
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This paper discusses two approaches for identifying lexical patterns in discourse, namely the corpus linguistic method of collocation analysis and the natural language processing method of word embeddings. While both approaches can identify lexical patterns, they approach the task with different underlying frameworks, and the extent to which their results resemble one another has not been directly compared. This study uses two corpora, five collocation measures, and two word embedding algorithms to generate such comparisons. Results generally support the notion that many word pairs with similar embeddings are collocates, and that, to a lesser extent, many collocates have similar word embeddings. However, a major difference is that word pairs with similar embeddings do not need to co-occur often, or at all. Moreover, systematic differences in the kinds of words highlighted between the two word embedding algorithms were found and are discussed.
引用
收藏
页数:9
相关论文
共 38 条
[1]  
Aghahadi Z, 2018, 2018 8TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), P303, DOI 10.1109/ICCKE.2018.8566605
[2]  
Almeida F, 2023, Arxiv, DOI [arXiv:1901.09069, DOI 10.48550/ARXIV.1901.09069]
[3]   The interpretation of dream meaning: Resolving ambiguity using Latent Semantic Analysis in a small corpus of text [J].
Altszyler, Edgar ;
Ribeiro, Sidarta ;
Sigman, Mariano ;
Fernandez Slezak, Diego .
CONSCIOUSNESS AND COGNITION, 2017, 56 :178-187
[4]  
Anke LE, 2021, 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), P1406
[5]  
[Anonymous], 2024, R Foundation for Statistical Computing
[6]  
Association Measures, The Corpus of Contemporary American English
[7]  
Baker P., 2023, Using corpora in discourse analysis
[8]  
Baker P., 2014, USING CORPORA ANAL G
[9]   Introduction to the Special Issue [J].
Baker, Paul .
DISCOURSE & COMMUNICATION, 2015, 9 (02) :143-147