Embeddings Evaluation Using a Novel Measure of Semantic Similarity

被引:0
|
作者
Anna Giabelli
Lorenzo Malandri
Fabio Mercorio
Mario Mezzanzanica
Navid Nobani
机构
[1] Univ. of Milan-Bicocca,Dept. of Informatics, Systems & Communication
[2] University of Milano Bicocca,CRISP Research Centre
[3] Univ. of Milan-Bicocca,Dept. of Statistics and Quantitative Methods
[4] Digital Attitude,undefined
来源
Cognitive Computation | 2022年 / 14卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Lexical taxonomies and distributional representations are largely used to support a wide range of NLP applications, including semantic similarity measurements. Recently, several scholars have proposed new approaches to combine those resources into unified representation preserving distributional and knowledge-based lexical features. In this paper, we propose and implement TaxoVec, a novel approach to selecting word embeddings based on their ability to preserve taxonomic similarity. In TaxoVec, we first compute the pairwise semantic similarity between taxonomic words through a new measure we previously developed, the Hierarchical Semantic Similarity (HSS), which we show outperforms previous measures on several benchmark tasks. Then, we train several embedding models on a text corpus and select the best model, that is, the model that maximizes the correlation between the HSS and the cosine similarity of the pair of words that are in both the taxonomy and the corpus. To evaluate TaxoVec, we repeat the embedding selection process using three other semantic similarity benchmark measures. We use the vectors of the four selected embeddings as machine learning model features to perform several NLP tasks. The performances of those tasks constitute an extrinsic evaluation of the criteria for the selection of the best embedding (i.e. the adopted semantic similarity measure). Experimental results show that (i) HSS outperforms state-of-the-art measures for measuring semantic similarity in taxonomy on a benchmark intrinsic evaluation and (ii) the embedding selected through TaxoVec achieves a clear victory against embeddings selected by the competing measures on benchmark NLP tasks. We implemented the HSS, together with other benchmark measures of semantic similarity, as a full-fledged Python package called TaxoSS, whose documentation is available at https://pypi.org/project/TaxoSS.
引用
收藏
页码:749 / 763
页数:14
相关论文
共 50 条
  • [41] Enhancing Short Text Semantic Similarity Measurement Using Pretrained Word Embeddings and Big Data
    Jinarat, Supakpong
    Pruengkarn, Ratchakoon
    2024 5TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS AND PRACTICES, IBDAP, 2024, : 63 - 66
  • [42] A Semantic and Syntactic Similarity Measure for Political Tweets
    Little, Claire
    Mclean, David
    Crockett, Keeley
    Edmonds, Bruce
    IEEE ACCESS, 2020, 8 : 154095 - 154113
  • [43] On fuzzy semantic similarity measure for DNA coding
    Ahmad, Muneer
    Jung, Low Tang
    Bhuiyan, Md Al-Amin
    COMPUTERS IN BIOLOGY AND MEDICINE, 2016, 69 : 144 - 151
  • [44] IWD towards Semantic similarity measure in ontology
    Rathee, Preeti
    Malik, Sanjay Kumar
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (07): : 1561 - 1577
  • [45] An Improved Semantic Similarity Measure for Word Pairs
    Cai, Songmei
    Lu, Zhao
    2010 INTERNATIONAL CONFERENCE ON E-EDUCATION, E-BUSINESS, E-MANAGEMENT AND E-LEARNING: IC4E 2010, PROCEEDINGS, 2010, : 212 - 216
  • [46] An efficient method to measure the semantic similarity of ontologies
    Wang, James Z.
    Ali, Farha
    Srimani, Pradip K.
    ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2008, 5036 : 447 - 458
  • [47] Measure Semantic Similarity between English Words
    Hu, Jinwu
    Dai, Liuling
    Liu, Bin
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 1689 - +
  • [48] News Summarization Based on Semantic Similarity Measure
    Yu, Hui
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 1, PROCEEDINGS, 2009, : 180 - 183
  • [49] A novel matrix factorization model for recommendation with LOD-based semantic similarity measure
    Wang, Ruiqin
    Cheng, Hsing Kenneth
    Jiang, Yunliang
    Lou, Jungang
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 123 : 70 - 81
  • [50] Evaluation of E-Commerce Sites using Novel Similarity Measure of Neutrosophic Hypersoft Sets
    Chaudhry R.S.
    Chandhok A.
    Neutrosophic Sets and Systems, 2023, 61 : 165 - 176