Combination of Loss-based Active Learning and Semi-supervised Learning for Recognizing Entities in Chinese Electronic Medical Records

被引:2
作者
Yan, Jinghui [1 ]
Zong, Chengqing [2 ,3 ]
Xu, Jinan [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp Sci & Informat Technol, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Sch Comp Sci & Informat Technol, Beijing 100049, Peoples R China
[3] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100049, Peoples R China
关键词
Electronic medical record; loss-based active learning; dynamic balance strategy; semi-supervised learning; RECOGNITION;
D O I
10.1145/3588314
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recognition of entities in an electronic medical record (EMR) is especially important to downstream tasks, such as clinical entity normalization and medical dialogue understanding. However, in the medical professional field, training a high-quality named entity recognition system always requires large-scale annotated datasets, which are highly expensive to obtain. In this article, to lower the cost of data annotation andmaximizing the use of unlabeled data, we propose a hybrid approach to recognizing the entities in Chinese electronic medical record, which is in combination of loss-based active learning and semi-supervised learning. Specifically, we adopted a dynamic balance strategy to dynamically balance the minimum loss predicted by a named entity recognition decoder and a loss prediction module at different stages in the process. Experimental results demonstrated our proposed framework's effectiveness and efficiency, achieving higher performances than existing approaches on Chinese EMR entity recognition datasets under limited labeling resources.
引用
收藏
页数:19
相关论文
共 53 条
[1]  
[Anonymous], 2004, P INT JOINT WORKSH N
[2]  
Carlson A, 2010, AAAI CONF ARTIF INTE, P1306
[3]  
Chiu J., 2016, Transactions of the Association for Computational Linguistics, V4, P357
[4]   Active learning with statistical models [J].
Cohn, DA ;
Ghahramani, Z ;
Jordan, MI .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :129-145
[5]  
Collins M., 1999, 1999 JOINT SIGDAT C, DOI DOI 10.3115/1072228.1072316
[6]  
Collobert R, 2011, J MACH LEARN RES, V12, P2493
[7]   Electronic health records to facilitate clinical research [J].
Cowie, Martin R. ;
Blomster, Juuso I. ;
Curtis, Lesley H. ;
Duclaux, Sylvie ;
Ford, Ian ;
Fritz, Fleur ;
Goldman, Samantha ;
Janmohamed, Salim ;
Kreuzer, Joerg ;
Leenay, Mark ;
Michel, Alexander ;
Ong, Seleen ;
Pell, Jill P. ;
Southworth, Mary Ross ;
Stough, Wendy Gattis ;
Thoenes, Martin ;
Zannad, Faiez ;
Zalewski, Andrew .
CLINICAL RESEARCH IN CARDIOLOGY, 2017, 106 (01) :1-9
[8]   Big biomedical data and cardiovascular disease research: opportunities and challenges [J].
Denaxas, Spiros C. ;
Morley, Katherine I. .
EUROPEAN HEART JOURNAL-QUALITY OF CARE AND CLINICAL OUTCOMES, 2015, 1 (01) :9-16
[9]   Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition [J].
Dong, Chuanhai ;
Zhang, Jiajun ;
Zong, Chengqing ;
Hattori, Masanori ;
Di, Hui .
NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 :239-250
[10]  
Gal Y, 2017, PR MACH LEARN RES, V70