Geometric Estimation of Specificity within Embedding Spaces

被引:11
作者
Arabzadeh, Negar [1 ]
Zarrinkalam, Fattane [1 ]
Jovanovic, Jelena [2 ]
Bagheri, Ebrahim [1 ]
机构
[1] Ryerson Univ, Toronto, ON, Canada
[2] Univ Belgrade, Fac Org Sci, Belgrade, Serbia
来源
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19) | 2019年
关键词
D O I
10.1145/3357384.3358152
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Specificity is the level of detail at which a given term is represented. Existing approaches to estimating term specificity are primarily dependent on corpus-level frequency statistics. In this work, we explore how neural embeddings can be used to define corpus-independent specificity metrics. Particularly, we propose to measure term specificity based on the distribution of terms in the neighborhood of the given term in the embedding space. The intuition is that a term that is surrounded by other terms in the embedding space is more likely to be specific while a term surrounded by less closely related terms is more likely to be generic. On this basis, we lever-age geometric properties between embedded terms to define three groups of metrics: (1) neighborhood-based, (2) graph-based and (3) cluster-based metrics. Moreover, we employ learning-to-rank techniques to estimate term specificity in a supervised approach by employing the three proposed groups of metrics. We curate and publicly share a test collection of term specificity measurements defined based on Wikipedia's category hierarchy. We report on our experiments through metric performance comparison, ablation study and comparison against the state-of-the-art baselines.
引用
收藏
页码:2109 / 2112
页数:4
相关论文
共 6 条
[1]  
He B, 2004, LECT NOTES COMPUT SC, V3246, P43
[2]  
Kapanipathi Pavan, 2014, The Semantic Web: Trends and Challenges. 11th International Conference (ESWC 2014). Proceedings: LNCS 8465, P99, DOI 10.1007/978-3-319-07443-6_8
[3]  
Li Y, 2016, P COLING 2016 26 INT, V2016, P2678
[4]   An Introduction to Neural Information Retrieval [J].
Mitra, Bhaskar ;
Craswell, Nick .
FOUNDATIONS AND TRENDS IN INFORMATION RETRIEVAL, 2018, 13 (01) :1-126
[5]   STATISTICAL INTERPRETATION OF TERM SPECIFICITY AND ITS APPLICATION IN RETRIEVAL [J].
SPARCKJONES, K .
JOURNAL OF DOCUMENTATION, 1972, 28 (01) :11-+
[6]   Neural Query Performance Prediction using Weak Supervision from Multiple Signals [J].
Zamani, Hamed ;
Croft, W. Bruce ;
Culpepper, J. Shane .
ACM/SIGIR PROCEEDINGS 2018, 2018, :105-114