Embeddings Evaluation Using a Novel Measure of Semantic Similarity

被引:0
|
作者
Anna Giabelli
Lorenzo Malandri
Fabio Mercorio
Mario Mezzanzanica
Navid Nobani
机构
[1] Univ. of Milan-Bicocca,Dept. of Informatics, Systems & Communication
[2] University of Milano Bicocca,CRISP Research Centre
[3] Univ. of Milan-Bicocca,Dept. of Statistics and Quantitative Methods
[4] Digital Attitude,undefined
来源
Cognitive Computation | 2022年 / 14卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Lexical taxonomies and distributional representations are largely used to support a wide range of NLP applications, including semantic similarity measurements. Recently, several scholars have proposed new approaches to combine those resources into unified representation preserving distributional and knowledge-based lexical features. In this paper, we propose and implement TaxoVec, a novel approach to selecting word embeddings based on their ability to preserve taxonomic similarity. In TaxoVec, we first compute the pairwise semantic similarity between taxonomic words through a new measure we previously developed, the Hierarchical Semantic Similarity (HSS), which we show outperforms previous measures on several benchmark tasks. Then, we train several embedding models on a text corpus and select the best model, that is, the model that maximizes the correlation between the HSS and the cosine similarity of the pair of words that are in both the taxonomy and the corpus. To evaluate TaxoVec, we repeat the embedding selection process using three other semantic similarity benchmark measures. We use the vectors of the four selected embeddings as machine learning model features to perform several NLP tasks. The performances of those tasks constitute an extrinsic evaluation of the criteria for the selection of the best embedding (i.e. the adopted semantic similarity measure). Experimental results show that (i) HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy on a benchmark intrinsic evaluation and (ii) the embedding selected through TaxoVec achieves a clear victory against embeddings selected by the competing measures on benchmark NLP tasks. We implemented the HSS, together with other benchmark measures of semantic similarity, as a full-fledged Python package called TaxoSS, whose documentation is available at https://pypi.org/project/TaxoSS.
引用
收藏
页码:749 / 763
页数:14
相关论文
共 50 条
  • [31] MedSim: A Novel Semantic Similarity Measure in Bio-medical Knowledge Graphs
    Lei, Kai
    Yuan, Kaiqi
    Zhang, Qiang
    Shen, Ying
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2018), PT I, 2018, 11061 : 479 - 490
  • [32] Measure the Semantic Similarity of GO Terms Using Aggregate Information Content
    Song, Xuebo
    Li, Lin
    Srimani, Pradip K.
    Yu, Philip S.
    Wang, James Z.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (03) : 468 - 476
  • [33] Comparison of Semantic Vectors with Reduced Precision using the Cosine Similarity Measure
    Karwatowski, Michal
    Wielgosz, Maciej
    Pietron, Marcin
    Staruchowicz, Mateusz
    Wiatr, Kazimierz
    PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 898 - 904
  • [34] Generating Abstraction Networks using Semantic Similarity Measure of Ontology Concepts
    Cirella, David
    Gu, Huanying
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 840 - 843
  • [35] Combining Explicit and Implicit Semantic Similarity Information for Word Embeddings
    Yin, Shi
    Li, Yaxi
    Chen, Xiaoping
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE (ICCAI 2018), 2018, : 1 - 8
  • [36] Computing semantic similarity based on novel models of semantic representation using Wikipedia
    Qu, Rong
    Fang, Yongyi
    Bai, Wen
    Jiang, Yuncheng
    INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (06) : 1002 - 1021
  • [37] SePass: Semantic Password Guessing Using k-nn Similarity Search in Word Embeddings
    Huenemoerder, Maximilian
    Schaefer, Levin
    Schueler, Nadine-Sarah
    Eichberg, Michael
    Kroeger, Peer
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II, 2022, 13726 : 28 - 42
  • [38] FAST: A Fuzzy Semantic Sentence Similarity Measure
    Chandran, David
    Crockett, Keeley
    Mclean, David
    Bandar, Zuhair
    2013 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ - IEEE 2013), 2013,
  • [39] An efficient method to measure the semantic similarity of ontologies
    Wang, James
    Ali, Farha
    Srimani, Pradip
    INTERNATIONAL JOURNAL OF PERVASIVE COMPUTING AND COMMUNICATIONS, 2010, 6 (01) : 88 - +
  • [40] Exploiting Synonymy to Measure Semantic Similarity of Sentences
    Shin, Youhyun
    Ahn, Yeonchan
    Kim, Hyuntak
    Lee, Sang-goo
    ACM IMCOM 2015, PROCEEDINGS, 2015,