Embeddings Evaluation Using a Novel Measure of Semantic Similarity

被引：0

作者：

Anna Giabelli

Lorenzo Malandri

Fabio Mercorio

Mario Mezzanzanica

Navid Nobani

机构：

[1] Univ. of Milan-Bicocca,Dept. of Informatics, Systems & Communication

[2] University of Milano Bicocca,CRISP Research Centre

[3] Univ. of Milan-Bicocca,Dept. of Statistics and Quantitative Methods

[4] Digital Attitude,undefined

来源：

Cognitive Computation | 2022年 / 14卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Lexical taxonomies and distributional representations are largely used to support a wide range of NLP applications, including semantic similarity measurements. Recently, several scholars have proposed new approaches to combine those resources into unified representation preserving distributional and knowledge-based lexical features. In this paper, we propose and implement TaxoVec, a novel approach to selecting word embeddings based on their ability to preserve taxonomic similarity. In TaxoVec, we first compute the pairwise semantic similarity between taxonomic words through a new measure we previously developed, the Hierarchical Semantic Similarity (HSS), which we show outperforms previous measures on several benchmark tasks. Then, we train several embedding models on a text corpus and select the best model, that is, the model that maximizes the correlation between the HSS and the cosine similarity of the pair of words that are in both the taxonomy and the corpus. To evaluate TaxoVec, we repeat the embedding selection process using three other semantic similarity benchmark measures. We use the vectors of the four selected embeddings as machine learning model features to perform several NLP tasks. The performances of those tasks constitute an extrinsic evaluation of the criteria for the selection of the best embedding (i.e. the adopted semantic similarity measure). Experimental results show that (i) HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy on a benchmark intrinsic evaluation and (ii) the embedding selected through TaxoVec achieves a clear victory against embeddings selected by the competing measures on benchmark NLP tasks. We implemented the HSS, together with other benchmark measures of semantic similarity, as a full-fledged Python package called TaxoSS, whose documentation is available at https://pypi.org/project/TaxoSS.

引用

页码：749 / 763

页数：14

共 50 条

[31] MedSim: A Novel Semantic Similarity Measure in Bio-medical Knowledge Graphs
Lei, Kai
Yuan, Kaiqi
Zhang, Qiang
Shen, Ying
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2018), PT I, 2018, 11061 : 479 - 490
[32] Measure the Semantic Similarity of GO Terms Using Aggregate Information Content
Song, Xuebo
Li, Lin
Srimani, Pradip K.
Yu, Philip S.
Wang, James Z.
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (03) : 468 - 476
[33] Comparison of Semantic Vectors with Reduced Precision using the Cosine Similarity Measure
Karwatowski, Michal
Wielgosz, Maciej
Pietron, Marcin
Staruchowicz, Mateusz
Wiatr, Kazimierz
PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 898 - 904
[34] Generating Abstraction Networks using Semantic Similarity Measure of Ontology Concepts
Cirella, David
Gu, Huanying
2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 840 - 843
[35] Combining Explicit and Implicit Semantic Similarity Information for Word Embeddings
Yin, Shi
Li, Yaxi
Chen, Xiaoping
PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 1 - 8
[36] Computing semantic similarity based on novel models of semantic representation using Wikipedia
Qu, Rong
Fang, Yongyi
Bai, Wen
Jiang, Yuncheng
INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 1002 - 1021
[37] SePass: Semantic Password Guessing Using k-nn Similarity Search in Word Embeddings
Huenemoerder, Maximilian
Schaefer, Levin
Schueler, Nadine-Sarah
Eichberg, Michael
Kroeger, Peer
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II, 2022, 13726 : 28 - 42
[38] FAST: A Fuzzy Semantic Sentence Similarity Measure
Chandran, David
Crockett, Keeley
Mclean, David
Bandar, Zuhair
2013 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ - IEEE 2013), 2013,
[39] An efficient method to measure the semantic similarity of ontologies
Wang, James
Ali, Farha
Srimani, Pradip
INTERNATIONAL JOURNAL OF PERVASIVE COMPUTING AND COMMUNICATIONS, 2010, 6 (01) : 88 - +
[40] Exploiting Synonymy to Measure Semantic Similarity of Sentences
Shin, Youhyun
Ahn, Yeonchan
Kim, Hyuntak
Lee, Sang-goo
ACM IMCOM 2015, PROCEEDINGS, 2015,

← 1 2 3 4 5 →