Clinical diagnosis normalization based on contrastive learning and pre-trained model

被引：0

作者：

Liu Y. ^{[1
]}

Cui B. ^{[1
,2
]}

Cao L. ^{[2
]}

Cheng L. ^{[1
,2
]}

机构：

[1] Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin

[2] China Electronics Cloud Brain (Tianjin) Technology Co. Ltd., Tianjin

来源：

Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition) | 2024年 / 52卷 / 05期

关键词：

bidirectional encoder representations from transformer (BERT); clinical diagnosis normalization; contrastive learning; pre-trained model; simple contrastive learning of sentence embeddings (SimCSE);

D O I：

10.13245/j.hust.240133

中图分类号：

学科分类号：

摘要：

Aiming at the problems caused by the current phenomenon of large scale of standard diagnostic thesaurus，limited textual relevance and uncertain number of standard words in clinical diagnosis normalization task，a clinical diagnosis normalization method based on contrastive learning and pre-training model was proposed．First，the simple contrastive learning of sentence embeddings (SimCSE) model was trained with a combination of unsupervised and supervised methods，and the obtained model was used to recall the candidate standard words from the standard thesaurus．Then，the candidate word reordering and classification of term counts were carried out based on bidirectional encoder representations from transformer (BERT)，and finally the results were obtained．Experimental results show that the recall rate of the combined unsupervised and supervised SimCSE method is 86.76%，which is higher than other methods，and the BERT model has significant improvement in several metrics compared with other models in the reordering and classification of term counts．The proposed method achieves an F1 value of 72.54% for prediction on the test dataset，which is a good performance in clinical diagnosis normalization. © 2024 Huazhong University of Science and Technology. All rights reserved.

引用

页码：23 / 28

页数：5

共 22 条

[1] 35, 4, pp. 75-82
[2] ARONSON A R．, Effective mapping of biomedical text to the UMLS metathesaurus： the MetaMap program[J], Am Med Inform Assn, 7, 1, pp. 17-21, (2001)
[3] SHAH N H, BHATIA N, JONQUET C, Comparison of concept recognizers for building the Open Biomedical Annotator[J], BMC Bioinformatics, 10, 9, pp. 1-9, (2009)
[4] MUTALIK P G，, DESHPANDE A, NADKARNI P M．, Use of general-purpose negation detection to augment concept indexing of medical documents： a quantitative study using the UMLS[J], Am Med Inform Assoc, 8, 6, pp. 598-609, (2001)
[5] REBHOLZ-SCHUHMANN D，, ARREGUI M, Text processing through Web services：calling Whatizit[J], Bioinformatics, 24, 2, pp. 296-298, (2008)
[6] LEAMAN R，, DOGAN R I，, LU Z Y．, DNorm： disease name normalization with pairwise learning to rank [J], Bioinformatics, 29, 22, pp. 2909-2917, (2013)
[7] LEAMAN R, KHARE R, LU Z．, Challenges in clinical natural language processing for automated disorder normalization[J], Biomed Inform, 57, pp. 28-37, (2015)
[8] 37, 2, pp. 52-56, (2016)
[9] LI H, CHEN Q, TANG B, CNN-based ranking for biomedical entity normalization[J], Bmc Bioinformatics, 18, pp. 79-86, (2017)
[10] WRIGHT D．, NormCo： deep disease normalization for biomedical knowledge base construction[C], Proc of Automated Knowledge Base Construction, pp. 1-19, (2019)

← 1 2 3 →