Evaluating Semantic Textual Similarity in Clinical Sentences Using Deep Learning and Sentence Embeddings

被引:1
作者
Antunes, Rui [1 ]
Silva, Joao Figueira [1 ]
Matos, Sergio [1 ]
机构
[1] Univ Aveiro, DETI IEETA, Aveiro, Portugal
来源
PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20) | 2020年
关键词
Natural language processing; clinical information extraction; semantic textual similarity; deep learning; sentence embeddings;
D O I
10.1145/3341105.3373987
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The wide adoption of electronic health records (EHRs) has fostered an improvement in healthcare quality, with EHRs currently representing a major source of medical information. Nevertheless, this process has also brought new challenges to the medical environment since the facilitated replication of information (e.g. using copy-paste) has resulted in less concise and sometimes incorrect information, which hinders the understandability of this data and can compromise the quality of medical decisions drawn from it. Due to the high volume and redundancy in medical data, it is imperative to develop solutions that can condense information whilst retaining its value, with a possible methodology involving the assessment of the semantic similarity between clinical text excerpts. In this paper we present an approach that explores neural networks and different types of text preprocessing pipelines, and that evaluates the impact of using word embeddings or sentence embeddings. We present the results following our participation in the n2c2 shared-task on clinical semantic textual similarity, perform an error analysis and discuss obtained results along with possible future improvements.
引用
收藏
页码:662 / 669
页数:8
相关论文
共 34 条
[1]  
Agirre E., 2014, P 8 INT WORKSH SEM E, P81, DOI [10.3115/v1/ S14-2010, DOI 10.3115/V1/S14-2010]
[2]  
Agirre Eneko, 2013, P MAIN C SHAR TASK S, P32
[3]  
Agirre Eneko, 2015, P 9 INT WORKSH SEM E, P252, DOI DOI 10.18653/V1/S15-2045
[4]  
Agirre Eneko, 2012, SEM 2012, P385
[5]  
[Anonymous], 2018, ARXIV181009302
[6]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[7]  
Bojanowski P., 2017, Transactions of the Association for Computational Linguistics, V5, P135, DOI [10.1162/tacla00051, DOI 10.1162/TACL_A_00051, DOI 10.1162/TACLA00051]
[8]  
Cer Daniel, 2017, P 11 INT WORKSH SEM, DOI DOI 10.18653/V1/S17-2001
[9]   Evaluation of Five Sentence Similarity Models on Electronic Medical Records [J].
Chen, Qingyu ;
Du, Jingcheng ;
Kim, Sun ;
Wilbur, W. John ;
Lu, Zhiyong .
ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, :533-533
[10]  
Chen Qingyu, 2018, P BIOCREATIVE OHNLP