SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation

被引:0
|
作者
Saeed, Sumaira [1 ]
Rajput, Quratulain [1 ]
Haider, Sajjad [1 ]
机构
[1] Univ Karachi, Inst Business Adm, Artificial Intelligence Lab, Univ Rd, Karachi 75270, Pakistan
关键词
Semantic Textual Similarity(STS); Explanation generation; Natural language processing; Embeddings; Clinical notes; ontology;
D O I
10.1016/j.ipm.2024.103771
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Measuring semantic similarity between two pieces of text is a widely known problem in Natural language processing(NLP). It has many applications, such as finding similar medical notes of patients to accelerate the diagnosis process, plagiarism detection, and document clustering. Most state-of-the-art models are based on machine/deep learning and lack sufficient explanations for their results, limiting their adoption in critical domains like healthcare. This paper presents a hybrid framework SUMEX (Semantic textUal siMilarity and EXplanation generation) that uniquely combines ontology with a state-of-the-art embedding-based model for semantic textual similarity. The primary strength of the framework is that it explains its results in humanunderstandable natural language, which is vital in critical domains such as healthcare. Experiments have been conducted on two datasets of clinical notes using four embeddings: ScispaCy, BioWord2Vec, ClinicalBERT, and a customized Word2Vec trained on clinical notes. The SUMEX framework outperforms the embedding-based model on the benchmark datasets of ClinicalSTS by improving average precision scores by 7 % and reducing the false-positives-rate by 23 %. On the Patients Similarity Dataset, the average top-five and top-three precision scores were improved by 14% and 10%, respectively, using SUMEX. The SUMEX also generates explanations for its results in natural language. The domain experts evaluated the quality of the explanations. The results show that the generated explanations are of significantly good quality, with a score of 90 % and 93 % for measures of Completeness and Correctness, respectively. In addition, ChatGPT was also used for similarity score and generating explanations. The experiments show that the SUMEX framework performed better than the ChatGPT.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Network New Word Discovery Framework Based on Sentence Semantic Vector Similarity
    Yu, GanFeng
    Ma, YueFeng
    Song, Yang
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 311 - 316
  • [42] Revealing the Influence of Semantic Similarity on Survey Responses: A Synthetic Data Generation Approach
    Lehtonen, Esko
    Buder-Grondahl, Tommi
    Nordhoff, Sina
    IEEE ACCESS, 2025, 13 : 40285 - 40301
  • [43] A HYBRID SEMANTIC SIMILARITY MEASURING APPROACH FOR ANNOTATING WSDL DOCUMENTS WITH ONTOLOGY CONCEPTS
    Lu, Wei
    Yang, Yong
    Xing, Weiwei
    Che, Xiaoping
    Cai, Yuanyuan
    Wang, Liqiang
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2017, 13 (04): : 1221 - 1242
  • [44] Semantic-Analysis Object Recognition: Automatic Training Set Generation Using Textual Tags
    Abdulhak, Sami Abduljalil
    Riviera, Walter
    Zeni, Nicola
    Cristani, Matteo
    Ferrario, Roberta
    Cristani, Marco
    COMPUTER VISION - ECCV 2014 WORKSHOPS, PT II, 2015, 8926 : 309 - 322
  • [45] A hybrid model to improve IC-related metrics of semantic similarity between words
    Xiao, Jia
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 6339 - 6377
  • [46] A Framework for Semantic Model Ontologies Generation for E-government Applications
    Dombeu, Jean Vincent Fonou
    Huisman, Magda
    Szpak, Zygmunt
    5TH INTERNATIONAL CONFERENCE ON DIGITAL SOCIETY (ICDS 2011), 2011, : 152 - 158
  • [47] Adapting Bidirectional Encoder Representations from Transformers (BERT) to Assess Clinical Semantic Textual Similarity: Algorithm Development and Validation Study
    Kades, Klaus
    Sellner, Jan
    Koehler, Gregor
    Full, Peter M.
    Lai, T. Y. Emmy
    Kleesiek, Jens
    Maier-Hein, Klaus H.
    JMIR MEDICAL INFORMATICS, 2021, 9 (02)
  • [48] A novel hybrid methodology for computing semantic similarity between sentences through various word senses
    Ahmad F.
    Faisal D.M.
    International Journal of Cognitive Computing in Engineering, 2022, 3 : 58 - 77
  • [49] Towards the Next Generation of Web of Things: A Survey on Semantic Web of Things' Framework
    Jahan, Farhat
    Fruitwala, Pranav
    Vyas, Tarjni
    PROCEEDINGS OF FIRST INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS: VOL 1, 2016, 50 : 31 - 39
  • [50] Incorporating Domain Knowledge Into Language Models by Using Graph Convolutional Networks for Assessing Semantic Textual Similarity: Model Development and Performance Comparison
    Chang, David
    Lin, Eric
    Brandt, Cynthia
    Taylor, Richard Andrew
    JMIR MEDICAL INFORMATICS, 2021, 9 (11)