Stability of Word Embeddings Using Word2Vec

被引:6
作者
Chugh, Mansi [1 ]
Whigham, Peter A. [1 ]
Dick, Grant [1 ]
机构
[1] Univ Otago, Dept Informat Sci, Dunedin, New Zealand
来源
AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE | 2018年 / 11320卷
关键词
Word2vec; Embedding dimension; Similarity; Stability;
D O I
10.1007/978-3-030-03991-2_73
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The word2vec model has been previously shown to be successful in creating numerical representations of words (word embeddings) that capture the semantic and syntactic meanings of words. This study examines the issue of model stability in terms of how consistent these representations are given a specific corpus and set of model parameters. Specifically, the study considers the impact of word embedding dimension size and frequency of words on stability. Stability is measured by comparing the neighborhood of words in the word vector space model. Our results demonstrate that the dimension size of word embeddings has a significant effect on the consistency of the model. In addition, the effect of the frequency of the target words on stability is identified. An approach to mitigate the effects of word frequency on stability is proposed.
引用
收藏
页码:812 / 818
页数:7
相关论文
共 12 条
[1]  
Antoniak M., 2018, Transactions of the Association for Computational Linguistics, V6, P107, DOI [DOI 10.1162/TACL_A_00008, 10.1162/tacl_a_00008]
[2]   Extracting semantic representations from word co-occurrence statistics: A computational study [J].
Bullinaria, John A. ;
Levy, Joseph P. .
BEHAVIOR RESEARCH METHODS, 2007, 39 (03) :510-526
[3]  
Church K.W., 1990, WORD ASS NORMS MUTUA, V16, P22
[4]  
Hamilton WL, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P1489
[5]   DISTRIBUTIONAL STRUCTURE [J].
Harris, Zellig S. .
WORD-JOURNAL OF THE INTERNATIONAL LINGUISTIC ASSOCIATION, 1954, 10 (2-3) :146-162
[6]   Exploring Diachronic Lexical Semantics with JESEME [J].
Hellrich, Johannes ;
Hahn, Udo .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017): SYSTEM DEMONSTRATIONS, 2017, :31-36
[7]  
Levy O., 2015, Transactions of the Association for Computational Linguistics, V3, P211, DOI [DOI 10.1162/TACL_A_00134, DOI 10.1162/TACLA00134]
[8]  
Mikolov T., DISTRIBUTED REPRESEN, P9
[9]  
Mikolov T., EFFICIENT ESTIMATION
[10]  
Pennington J, 2014, British Journal of Neurosurgery, V2014, P1532