Chinese clinical named entity recognition with variant neural structures based on BERT methods

被引:122
作者
Li, Xiangyang [1 ,2 ]
Zhang, Huan [3 ]
Zhou, Xiao-Hua [4 ,5 ]
机构
[1] Peking Univ, Sch Math Sci, Beijing 100871, Peoples R China
[2] Peking Univ, Ctr Stat Sci, Beijing 100871, Peoples R China
[3] Peking Univ, Acad Adv Interdisciplinary Studies, Beijing 100871, Peoples R China
[4] Peking Univ, Beijing Int Ctr Math Res, Beijing 100871, Peoples R China
[5] Peking Univ, Dept Biostat, Beijing 100871, Peoples R China
关键词
Clinical named entity recognition; BERT; LSTM; CRF;
D O I
10.1016/j.jbi.2020.103422
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Clinical Named Entity Recognition (CNER) is a critical task which aims to identify and classify clinical terms in electronic medical records. In recent years, deep neural networks have achieved significant success in CNER. However, these methods require high-quality and large-scale labeled clinical data, which is challenging and expensive to obtain, especially data on Chinese clinical records. To tackle the Chinese CNER task, we pre-train BERT model on the unlabeled Chinese clinical records, which can leverage the unlabeled domain-specific knowledge. Different layers such as Long Short-Term Memory (LSTM) and Conditional Random Field (CRF) are used to extract the text features and decode the predicted tags respectively. In addition, we propose a new strategy to incorporate dictionary features into the model. Radical features of Chinese characters are used to improve the model performance as well. To the best of our knowledge, our ensemble model outperforms the state of the art models which achieves 89.56% strict F1 score on the CCKS-2018 dataset and 91.60% F1 score on CCKS-2017 dataset.
引用
收藏
页数:7
相关论文
共 38 条
[1]  
[Anonymous], P 2019 C N AM CHAPTE, DOI DOI 10.18653/V1/N19-1423
[2]  
[Anonymous], 2017, CEUR WORKSHOP P
[3]  
[Anonymous], 2018, Deep contextualized word representations
[4]  
Beltagy I., 2019, EMNLP
[5]  
Chen Y, 2006, CONFERENCE DIGEST OF THE 2006 JOINT 31ST INTERNATIONAL CONFERENCE ON INFRARED AND MILLIMETER WAVES AND 14TH INTERNATIONAL CONFERENCE ON TERAHERTZ ELECTRONICS, P118
[6]   Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training [J].
Chen, Yao ;
Zhou, Changjiang ;
Li, Tianxin ;
Wu, Hong ;
Zhao, Xia ;
Ye, Kai ;
Liao, Jun .
JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 96
[7]   Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition [J].
Dong, Chuanhai ;
Zhang, Jiajun ;
Zong, Chengqing ;
Hattori, Masanori ;
Di, Hui .
NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 :239-250
[8]  
Gai R L, 2014, Advanced materials research, V926, P3368, DOI DOI 10.4028/SCIENTIFIC.NET/AMR.926-930.3368
[9]  
Hakala K., 2019, P 5 WORKSH BIONLP OP, P56
[10]  
Huang Z., 2015, ARXIV