Joint segmentation and named entity recognition using dual decomposition in Chinese discharge summaries

被引：44

作者：

Xu, Yan ^{[1
,2
]}

Wang, Yining ^{[2
,3
]}

Liu, Tianren ^{[2
,3
]}

Liu, Jiahua ^{[2
,4
]}

Fan, Yubo ^{[1
]}

Qian, Yi ^{[5
]}

Tsujii, Junichi ^{[2
]}

Chang, Eric I. ^{[2
]}

机构：

[1] Beihang Univ, Minist Educ, Key Lab Biomech & Mechanobiol, State Key Lab Software Dev Environm, Beijing, Peoples R China

[2] Microsoft Res Asia, Beijing 100080, Peoples R China

[3] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Dept Comp Sci, Beijing 100084, Peoples R China

[4] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

[5] Jinhua Peoples Hosp, Jinhuah, Zhejiang, Peoples R China

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2014年 / 21卷 / E1期

基金：

美国国家科学基金会;

关键词：

INFORMATION;

D O I：

10.1136/amiajnl-2013-001806

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objective In this paper, we focus on three aspects: (1) to annotate a set of standard corpus in Chinese discharge summaries; (2) to perform word segmentation and named entity recognition in the above corpus; (3) to build a joint model that performs word segmentation and named entity recognition. Design Two independent systems of word segmentation and named entity recognition were built based on conditional random field models. In the field of natural language processing, while most approaches use a single model to predict outputs, many works have proved that performance of many tasks can be improved by exploiting combined techniques. Therefore, in this paper, we proposed a joint model using dual decomposition to perform both the two tasks in order to exploit correlations between the two tasks. Three sets of features were designed to demonstrate the advantage of the joint model we proposed, compared with independent models, incremental models and a joint model trained on combined labels. Measurements Micro-averaged precision (P), recall (R), and F-measure (F) were used to evaluate results. Results The gold standard corpus is created using 336 Chinese discharge summaries of 71 355 words. The framework using dual decomposition achieved 0.2% improvement for segmentation and 1% improvement for recognition, compared with each of the two tasks alone. Conclusions The joint model is efficient and effective in both segmentation and recognition compared with the two individual tasks. The model achieved encouraging results, demonstrating the feasibility of the two tasks.

引用

页码：E84 / E92

页数：9

共 34 条

[1] [Anonymous], 2011, P 2011 C EMPIRICAL M
[2] [Anonymous], 2008, Notes on decomposition methods, stanford university
[3] [Anonymous], 2011, P BIONLP SHAR TASK 2
[4] [Anonymous], 2009, P JOINT C 47 ANN M A
[5] [Anonymous], 2001, PROC 18 INT C MACH L
[6] Chang Y., 2011, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, P26
[7] Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions
Chapman, Wendy W.
Nadkarni, Prakash M.
Hirschman, Lynette
D'Avolio, Leonard W.
Savova, Guergana K.
Uzuner, Ozlem
[J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (05) : 540 - 543
[8] DECOMPOSITION PRINCIPLE FOR LINEAR-PROGRAMS
DANTZIG, GB
WOLFE, P
[J]. OPERATIONS RESEARCH, 1960, 8 (01) : 101 - 111
[9] Das D., 2012, P SEM
[10] A context-blocks model for identifying clinical relationships in patient records
Dogan, Rezarta Islamaj
Neveol, Aurelie
Lu, Zhiyong
[J]. BMC BIOINFORMATICS, 2011, 12

← 1 2 3 4 →