Embeddings Evaluation Using a Novel Measure of Semantic Similarity

被引:0
|
作者
Anna Giabelli
Lorenzo Malandri
Fabio Mercorio
Mario Mezzanzanica
Navid Nobani
机构
[1] Univ. of Milan-Bicocca,Dept. of Informatics, Systems & Communication
[2] University of Milano Bicocca,CRISP Research Centre
[3] Univ. of Milan-Bicocca,Dept. of Statistics and Quantitative Methods
[4] Digital Attitude,undefined
来源
Cognitive Computation | 2022年 / 14卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Lexical taxonomies and distributional representations are largely used to support a wide range of NLP applications, including semantic similarity measurements. Recently, several scholars have proposed new approaches to combine those resources into unified representation preserving distributional and knowledge-based lexical features. In this paper, we propose and implement TaxoVec, a novel approach to selecting word embeddings based on their ability to preserve taxonomic similarity. In TaxoVec, we first compute the pairwise semantic similarity between taxonomic words through a new measure we previously developed, the Hierarchical Semantic Similarity (HSS), which we show outperforms previous measures on several benchmark tasks. Then, we train several embedding models on a text corpus and select the best model, that is, the model that maximizes the correlation between the HSS and the cosine similarity of the pair of words that are in both the taxonomy and the corpus. To evaluate TaxoVec, we repeat the embedding selection process using three other semantic similarity benchmark measures. We use the vectors of the four selected embeddings as machine learning model features to perform several NLP tasks. The performances of those tasks constitute an extrinsic evaluation of the criteria for the selection of the best embedding (i.e. the adopted semantic similarity measure). Experimental results show that (i) HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy on a benchmark intrinsic evaluation and (ii) the embedding selected through TaxoVec achieves a clear victory against embeddings selected by the competing measures on benchmark NLP tasks. We implemented the HSS, together with other benchmark measures of semantic similarity, as a full-fledged Python package called TaxoSS, whose documentation is available at https://pypi.org/project/TaxoSS.
引用
收藏
页码:749 / 763
页数:14
相关论文
共 50 条
  • [21] Using SNOMED Distance to Measure Semantic Similarity of Clinical Trials
    Wei, Duo
    Fu, Gang
    MEDINFO 2017: PRECISION HEALTHCARE THROUGH INFORMATICS, 2017, 245 : 1341 - 1341
  • [22] Short texts semantic similarity based on word embeddings
    Babic, Karlo
    Martincic-Ipsic, Sanda
    Mestrovic, Ana
    Guerra, Francesco
    CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS (CECIIS 2019), 2019, : 27 - 33
  • [23] Adapting Gloss Vector Semantic Relatedness Measure for Semantic Similarity Estimation: An Evaluation in the Biomedical Domain
    Pesaranghader, Ahmad
    Rezaei, Azadeh
    Pesaranghader, Ali
    SEMANTIC TECHNOLOGY, 2014, 8388 : 129 - 145
  • [24] A COMBINED MEASURE FOR TEXT SEMANTIC SIMILARITY
    Li, Hao-Di
    Chen, Qing-Cai
    Wang, Xiao-Long
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1869 - 1873
  • [25] Similarity Measure for Semantic Document Interconnections
    Hwang, Myunggwon
    Choi, Dongjin
    Choi, Junho
    Kim, Hanil
    Kim, Pankoo
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2010, 13 (02): : 253 - 267
  • [26] A semantic similarity measure for the SIMS framework
    Pirrone, Roberto
    Russo, Giuseppe
    Sangiorgi, Pierluca
    Ingraffia, Nunzio
    Vicari, Claudia
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2008, 5179 : 285 - +
  • [27] Semantic similarity measure for Thai language
    Wongchaisuwat, Papis
    2018 INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2018), 2018, : 11 - 16
  • [28] Semantic search framework over knowledge bases using embeddings-based similarity
    Khan, Aatif Ahmad
    Malik, Sanjay Kumar
    Jain, Vanita
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2024, 27 (06): : 1963 - 1975
  • [29] Evaluating Semantic Textual Similarity in Clinical Sentences Using Deep Learning and Sentence Embeddings
    Antunes, Rui
    Silva, Joao Figueira
    Matos, Sergio
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 662 - 669
  • [30] A METHODOLOGY FOR USING EDGES TO MEASURE STRUCTURAL AND SEMANTIC SIMILARITY OF XML DOCUMENTS
    Qiu, Hong-Jun
    Yu, Wen-Jing
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 1653 - +