Evaluation of Five Sentence Similarity Models on Electronic Medical Records

被引：3

作者：

Chen, Qingyu ^{[1
]}

Du, Jingcheng ^{[1
,2
]}

Kim, Sun ^{[1
]}

Wilbur, W. John ^{[1
]}

Lu, Zhiyong ^{[1
]}

机构：

[1] Natl Inst Hlth NIH, Natl Ctr Biotechnol Informat NCBI, Natl Lib Med NLM, Bethesda, MD 20814 USA

[2] UTHlth, Sch Biomed Informat, Houston, TX USA

来源：

ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS | 2019年

关键词：

Natural language processing; EHR; textual similarity;

D O I：

10.1145/3307339.3343239

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Capturing the semantic similarity between sentences plays a vital role in several primary applications in biomedical and clinical domains: biomedical sentence search, evidence attribution, question-answering and text summarization. In this pilot study, we evaluated the effectiveness of five representative sentence similarity models, ranging from traditional machine learning methods to the latest bidirectional transformers in the clinical domain. The evaluation was performed on a dataset consisting of over 1K sentence pairs from EMRs - the largest public dataset in this domain by far. The results show that embeddings on large biomedical corpora are the most effective methods. It also demonstrates that CNN and BERT are effective to capture sentence similarity under relatively small datasets.

引用

页码：533 / 533

页数：1