SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation

被引:0
|
作者
Saeed, Sumaira [1 ]
Rajput, Quratulain [1 ]
Haider, Sajjad [1 ]
机构
[1] Univ Karachi, Inst Business Adm, Artificial Intelligence Lab, Univ Rd, Karachi 75270, Pakistan
关键词
Semantic Textual Similarity(STS); Explanation generation; Natural language processing; Embeddings; Clinical notes; ontology;
D O I
10.1016/j.ipm.2024.103771
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Measuring semantic similarity between two pieces of text is a widely known problem in Natural language processing(NLP). It has many applications, such as finding similar medical notes of patients to accelerate the diagnosis process, plagiarism detection, and document clustering. Most state-of-the-art models are based on machine/deep learning and lack sufficient explanations for their results, limiting their adoption in critical domains like healthcare. This paper presents a hybrid framework SUMEX (Semantic textUal siMilarity and EXplanation generation) that uniquely combines ontology with a state-of-the-art embedding-based model for semantic textual similarity. The primary strength of the framework is that it explains its results in humanunderstandable natural language, which is vital in critical domains such as healthcare. Experiments have been conducted on two datasets of clinical notes using four embeddings: ScispaCy, BioWord2Vec, ClinicalBERT, and a customized Word2Vec trained on clinical notes. The SUMEX framework outperforms the embedding-based model on the benchmark datasets of ClinicalSTS by improving average precision scores by 7 % and reducing the false-positives-rate by 23 %. On the Patients Similarity Dataset, the average top-five and top-three precision scores were improved by 14% and 10%, respectively, using SUMEX. The SUMEX also generates explanations for its results in natural language. The domain experts evaluated the quality of the explanations. The results show that the generated explanations are of significantly good quality, with a score of 90 % and 93 % for measures of Completeness and Correctness, respectively. In addition, ChatGPT was also used for similarity score and generating explanations. The experiments show that the SUMEX framework performed better than the ChatGPT.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] A framework for automatic causality extraction using semantic similarity
    Kim, Sanghee
    Bracewell, Rob H.
    Wallace, Ken M.
    27TH COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, VOL 2, PTS A AND B 2007: PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2008, : 831 - 840
  • [22] Semantic Framework for Interactive Animation Generation
    Liang, Hui
    Chang, Jian
    Wang, Meili
    Chen, Can
    Zhang, Jian Jun
    PROCEEDINGS VRCAI 2016: 15TH ACM SIGGRAPH CONFERENCE ON VIRTUAL-REALITY CONTINUUM AND ITS APPLICATIONS IN INDUSTRY, 2016, : 137 - 145
  • [23] Semantic Similarity Measures for the Generation of Science Tests in Basque
    Aldabe, Itziar
    Maritxalar, Montse
    IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 2014, 7 (04): : 375 - 387
  • [24] Improved Hybrid Semantic Similarity Algorithm for Terminology Application
    Wei, Tong
    Jia, Yangli
    Zhang, Zhenling
    Roche, Julien
    Roche, Christophe
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1734 - 1738
  • [25] Sentence Embedding and Convolutional Neural Network for Semantic Textual Similarity Detection in Arabic Language
    Mahmoud, Adnen
    Zrigui, Mounir
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9263 - 9274
  • [26] Sentence Embedding and Convolutional Neural Network for Semantic Textual Similarity Detection in Arabic Language
    Adnen Mahmoud
    Mounir Zrigui
    Arabian Journal for Science and Engineering, 2019, 44 : 9263 - 9274
  • [27] Semantic Textual Similarity on Brazilian Portuguese: An approach based on language-mixture models
    Silva, A.
    Lozkins, A.
    Bertoldi, L. R.
    Rigo, S.
    Bure, V. M.
    VESTNIK SANKT-PETERBURGSKOGO UNIVERSITETA SERIYA 10 PRIKLADNAYA MATEMATIKA INFORMATIKA PROTSESSY UPRAVLENIYA, 2019, 15 (02): : 235 - 244
  • [28] SupMPN: Supervised Multiple Positives and Negatives Contrastive Learning Model for Semantic Textual Similarity
    Dehghan, Somaiyeh
    Amasyali, Mehmet Fatih
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [29] Evaluating Semantic Textual Similarity in Clinical Sentences Using Deep Learning and Sentence Embeddings
    Antunes, Rui
    Silva, Joao Figueira
    Matos, Sergio
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 662 - 669
  • [30] Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models
    Yang, Xi
    He, Xing
    Zhang, Hansi
    Ma, Yinghan
    Bian, Jiang
    Wu, Yonghui
    JMIR MEDICAL INFORMATICS, 2020, 8 (11)