Filtering Communities in Word Co-Occurrence Networks to Foster the Emergence of Meaning

被引:0
作者
Beranger, Anna [1 ]
Dugue, Nicolas [1 ]
Guillot, Simon [1 ]
Prouteau, Thibault [1 ]
机构
[1] Univ Mans, LIUM, Ave Olivier Messiaen, F-72000 Le Mans, France
来源
COMPLEX NETWORKS & THEIR APPLICATIONS XII, VOL 1, COMPLEX NETWORKS 2023 | 2024年 / 1141卷
关键词
Word co-occurrence networks; community detection; word embedding; linguistics; interpretability;
D O I
10.1007/978-3-031-53468-3_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With SINr, we introduced a way to design graph and word embeddings based on community detection. Contrary to deep learning approaches, this approach does not require much compute and was proven to be at the state-of-the-art for interpretability in the context of word embeddings. In this paper, we investigate how filtering communities detected on word co-occurrence networks can improve performances of the approach. Community detection algorithms tend to uncover communities whose size follows a power-law distribution. Naturally, the number of activations per dimensions in SINr follows a power-law: a few dimensions are activated by many words, and many dimensions are activated by a few words. By filtering this distribution, removing part of its head and tail, we show improvement on intrinsic evaluation of the embedding while dividing their dimensionality by five. In addition, we show that these results are stable through several runs, thus defining a subset of distinctive features to describe a given corpus.
引用
收藏
页码:377 / 388
页数:12
相关论文
共 23 条
[1]   COMPLEX SYSTEMS Unzipping Zipf's law [J].
Adamic, Lada .
NATURE, 2011, 474 (7350) :164-165
[2]  
[Anonymous], 2009, P HLT 2009 ANN C NAA
[3]  
[Anonymous], 2015, Transactions of the Association for Computational Linguistics, DOI DOI 10.1162/TACL_A_00134
[4]  
Baroni M., 2005, Corpus Linguistics: An International Handbook, V2, P803
[5]   The WaCky wide web: a collection of very large linguistically processed web-crawled corpora [J].
Baroni, Marco ;
Bernardini, Silvia ;
Ferraresi, Adriano ;
Zanchetta, Eros .
LANGUAGE RESOURCES AND EVALUATION, 2009, 43 (03) :209-226
[6]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[7]  
BNC Consortium, 2007, British National Corpus
[8]   Multimodal Distributional Semantics [J].
Bruni, Elia ;
Nam Khanh Tran ;
Baroni, Marco .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 49 :1-47
[9]   Bringing a Feature Selection Metric from Machine Learning to Complex Networks [J].
Dugue, Nicolas ;
Lamirel, Jean-Charles ;
Perez, Anthony .
COMPLEX NETWORKS AND THEIR APPLICATIONS VII, VOL 2, 2019, 813 :107-118
[10]  
Guillot S., 2023, INT WORKSH COMP SEM