SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation

被引：0

作者：

Saeed, Sumaira ^{[1
]}

Rajput, Quratulain ^{[1
]}

Haider, Sajjad ^{[1
]}

机构：

[1] Univ Karachi, Inst Business Adm, Artificial Intelligence Lab, Univ Rd, Karachi 75270, Pakistan

来源：

INFORMATION PROCESSING & MANAGEMENT | 2024年 / 61卷 / 05期

关键词：

Semantic Textual Similarity(STS); Explanation generation; Natural language processing; Embeddings; Clinical notes; ontology;

D O I：

10.1016/j.ipm.2024.103771

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Measuring semantic similarity between two pieces of text is a widely known problem in Natural language processing(NLP). It has many applications, such as finding similar medical notes of patients to accelerate the diagnosis process, plagiarism detection, and document clustering. Most state-of-the-art models are based on machine/deep learning and lack sufficient explanations for their results, limiting their adoption in critical domains like healthcare. This paper presents a hybrid framework SUMEX (Semantic textUal siMilarity and EXplanation generation) that uniquely combines ontology with a state-of-the-art embedding-based model for semantic textual similarity. The primary strength of the framework is that it explains its results in humanunderstandable natural language, which is vital in critical domains such as healthcare. Experiments have been conducted on two datasets of clinical notes using four embeddings: ScispaCy, BioWord2Vec, ClinicalBERT, and a customized Word2Vec trained on clinical notes. The SUMEX framework outperforms the embedding-based model on the benchmark datasets of ClinicalSTS by improving average precision scores by 7 % and reducing the false-positives-rate by 23 %. On the Patients Similarity Dataset, the average top-five and top-three precision scores were improved by 14% and 10%, respectively, using SUMEX. The SUMEX also generates explanations for its results in natural language. The domain experts evaluated the quality of the explanations. The results show that the generated explanations are of significantly good quality, with a score of 90 % and 93 % for measures of Completeness and Correctness, respectively. In addition, ChatGPT was also used for similarity score and generating explanations. The experiments show that the SUMEX framework performed better than the ChatGPT.

引用

页数：22

共 50 条

[1] MedSTS: a resource for clinical semantic textual similarity
Wang, Yanshan
Afzal, Naveed
Fu, Sunyang
Wang, Liwei
Shen, Feichen
Rastegar-Mojarad, Majid
Liu, Hongfang
LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (01) : 57 - 72
[2] MedSTS: a resource for clinical semantic textual similarity
Yanshan Wang
Naveed Afzal
Sunyang Fu
Liwei Wang
Feichen Shen
Majid Rastegar-Mojarad
Hongfang Liu
Language Resources and Evaluation, 2020, 54 : 57 - 72
[3] Question Similarity Detection in Turkish Using Semantic Textual Similarity Methods
Yildiz, Eray
Findik, Yasin
2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
[4] UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method
Hassan, Basma
Abdelrahman, Samir E.
Bahgat, Reem
Farag, Ibrahim
IEEE ACCESS, 2019, 7 : 85462 - 85482
[5] Semantic Textual Similarity Methods, Tools, and Applications: A Survey
Majumder, Goutam
Pakray, Partha
Gelbukh, Alexander
Pinto, David
COMPUTACION Y SISTEMAS, 2016, 20 (04): : 647 - 665
[6] Gradually Improving the Computation of Semantic Textual Similarity in Portuguese
Oliveira, Hugo Goncalo
Alves, Ana Oliveira
Rodrigues, Ricardo
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017), 2017, 10423 : 841 - 854
[7] Spectral Learning of Semantic Units in a Sentence Pair to Evaluate Semantic Textual Similarity
Mehndiratta, Akanksha
Asawa, Krishna
8TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS, BDA 2020, 2020, 12581 : 49 - 59
[8] Advancing Knowledge Discoveries in Criminal Investigations with Semantic Textual Similarity
Skipanes, Mads
Jorgensen, Tollef Emil
Franke, Katrin
LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 379 : 269 - 274
[9] Phrase-based Semantic Textual Similarity for Linking Researchers
Reyes-Ortiz, Jose A.
Bravo, Maricela
Padilla, Omar E.
2015 26TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2015, : 202 - 206
[10] Word Embedding based Textual Semantic Similarity Measure in Bengali
Iqbal, Md Asif
Sharif, Omar
Hoque, Mohammed Moshiul
Sarker, Iqbal H.
10TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE IN COMPUTATIONAL SCIENCE (YSC2021), 2021, 193 : 92 - 101

← 1 2 3 4 5 →