Evaluation of Five Sentence Similarity Models on Electronic Medical Records

被引:3
作者
Chen, Qingyu [1 ]
Du, Jingcheng [1 ,2 ]
Kim, Sun [1 ]
Wilbur, W. John [1 ]
Lu, Zhiyong [1 ]
机构
[1] Natl Inst Hlth NIH, Natl Ctr Biotechnol Informat NCBI, Natl Lib Med NLM, Bethesda, MD 20814 USA
[2] UTHlth, Sch Biomed Informat, Houston, TX USA
来源
ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS | 2019年
关键词
Natural language processing; EHR; textual similarity;
D O I
10.1145/3307339.3343239
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Capturing the semantic similarity between sentences plays a vital role in several primary applications in biomedical and clinical domains: biomedical sentence search, evidence attribution, question-answering and text summarization. In this pilot study, we evaluated the effectiveness of five representative sentence similarity models, ranging from traditional machine learning methods to the latest bidirectional transformers in the clinical domain. The evaluation was performed on a dataset consisting of over 1K sentence pairs from EMRs - the largest public dataset in this domain by far. The results show that embeddings on large biomedical corpora are the most effective methods. It also demonstrates that CNN and BERT are effective to capture sentence similarity under relatively small datasets.
引用
收藏
页码:533 / 533
页数:1
相关论文
empty
未找到相关数据