SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation

被引:0
|
作者
Saeed, Sumaira [1 ]
Rajput, Quratulain [1 ]
Haider, Sajjad [1 ]
机构
[1] Univ Karachi, Inst Business Adm, Artificial Intelligence Lab, Univ Rd, Karachi 75270, Pakistan
关键词
Semantic Textual Similarity(STS); Explanation generation; Natural language processing; Embeddings; Clinical notes; ontology;
D O I
10.1016/j.ipm.2024.103771
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Measuring semantic similarity between two pieces of text is a widely known problem in Natural language processing(NLP). It has many applications, such as finding similar medical notes of patients to accelerate the diagnosis process, plagiarism detection, and document clustering. Most state-of-the-art models are based on machine/deep learning and lack sufficient explanations for their results, limiting their adoption in critical domains like healthcare. This paper presents a hybrid framework SUMEX (Semantic textUal siMilarity and EXplanation generation) that uniquely combines ontology with a state-of-the-art embedding-based model for semantic textual similarity. The primary strength of the framework is that it explains its results in humanunderstandable natural language, which is vital in critical domains such as healthcare. Experiments have been conducted on two datasets of clinical notes using four embeddings: ScispaCy, BioWord2Vec, ClinicalBERT, and a customized Word2Vec trained on clinical notes. The SUMEX framework outperforms the embedding-based model on the benchmark datasets of ClinicalSTS by improving average precision scores by 7 % and reducing the false-positives-rate by 23 %. On the Patients Similarity Dataset, the average top-five and top-three precision scores were improved by 14% and 10%, respectively, using SUMEX. The SUMEX also generates explanations for its results in natural language. The domain experts evaluated the quality of the explanations. The results show that the generated explanations are of significantly good quality, with a score of 90 % and 93 % for measures of Completeness and Correctness, respectively. In addition, ChatGPT was also used for similarity score and generating explanations. The experiments show that the SUMEX framework performed better than the ChatGPT.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] An Ontology Concept Update Method Based on Hybrid Semantic Similarity
    Zhang, Peng
    Qi, Jiahui
    Wu, Min
    2019 2ND INTERNATIONAL CONFERENCE ON MECHANICAL ENGINEERING, INDUSTRIAL MATERIALS AND INDUSTRIAL ELECTRONICS (MEIMIE 2019), 2019, : 232 - 240
  • [32] Combining Attention-based Models with the MeSH Ontology for Semantic Textual Similarity in Clinical Notes
    Faramarzi, Noushin Salek
    Dara, Akanksha
    Banerjee, Ritwik
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 74 - 83
  • [33] The Semantic of Business Vocabulary and Business Rules: An Automatic Generation From Textual Statements
    Haj, Abdellatif
    Jarrar, Abdessamd
    Balouki, Youssef
    Gadir, Taoufiq
    IEEE ACCESS, 2021, 9 : 56506 - 56522
  • [34] A Hybrid Approach for Measuring Semantic Similarity between Ontologies Based on WordNet
    He, Wei
    Yang, Xiaoping
    Huang, Dupei
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2011, 7091 : 68 - +
  • [35] The 2019 n2c2/OHNLP Track on Clinical Semantic Textual Similarity: Overview
    Wang, Yanshan
    Fu, Sunyang
    Shen, Feichen
    Henry, Sam
    Uzuner, Ozlem
    Liu, Hongfang
    JMIR MEDICAL INFORMATICS, 2020, 8 (11)
  • [36] A novel locality-sensitive hashing relational graph matching network for semantic textual similarity measurement
    Li, Haozhe
    Wang, Wenhai
    Liu, Zhaoran
    Niu, Yunlong
    Wang, Hao
    Zhao, Shunping
    Liao, Yilin
    Yang, Weigeng
    Liu, Xinggao
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
  • [37] Predicting learning performance using NLP: an exploratory study using two semantic textual similarity methods
    Papadimas, C.
    Ragazou, V.
    Karasavvidis, I.
    Kollias, V.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, : 4567 - 4595
  • [38] Towards a hybrid semantic similarity measure to set the conceptual relatedness in a hierarchy
    Nessah D.
    Kazar O.
    Benharkat A.-N.
    Nessah, Djamel (nhdjamel@yahoo.fr), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (11): : 155 - 164
  • [39] A Hybrid Semantic Networks Construction Framework for Engineering Design
    Cheligeer, Cheligeer
    Yang, Jiami
    Bayatpour, Amin
    Miklin, Alexandra
    Dufresne, Stephane
    Lin, Lan
    Bhuiyan, Nadia
    Zeng, Yong
    JOURNAL OF MECHANICAL DESIGN, 2023, 145 (04)
  • [40] CANBLWO: A Novel Hybrid Approach for Semantic Text Generation
    Pandey, Abhishek Kumar
    Roy, Sanjiban Sekhar
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2024, 21 (04) : 690 - 710