Normalization of relative and incomplete temporal expressions in clinical narratives

被引:8
作者
Sun, Weiyi [1 ]
Rumshisky, Anna [2 ]
Uzuner, Ozlem [3 ]
机构
[1] SUNY Albany, Dept Informat, Albany, NY 12222 USA
[2] Univ Massachusetts, Dept Comp Sci, Lowell, MA USA
[3] SUNY Albany, Dept Informat Studies, Albany, NY 12222 USA
基金
美国国家卫生研究院;
关键词
temporal reasoning; medical language processing; temporal expression normalization; INFORMATION EXTRACTION; EVENTS; SYSTEM; TEXT;
D O I
10.1093/jamia/ocu004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective To improve the normalization of relative and incomplete temporal expressions (RI-TIMEXes) in clinical narratives. Methods We analyzed the RI-TIMEXes in temporally annotated corpora and propose two hypotheses regarding the normalization of RI-TIMEXes in the clinical narrative domain: the anchor point hypothesis and the anchor relation hypothesis. We annotated the RI-TIMEXes in three corpora to study the characteristics of RI-TMEXes in different domains. This informed the design of our RI-TIMEX normalization system for the clinical domain, which consists of an anchor point classifier, an anchor relation classifier, and a rule-based RI-TIMEX text span parser. We experimented with different feature sets and performed an error analysis for each system component. Results The annotation confirmed the hypotheses that we can simplify the RI-TIMEXes normalization task using two multi-label classifiers. Our system achieves anchor point classification, anchor relation classification, and rule-based parsing accuracy of 74.68%, 87.71%, and 57.2% (82.09% under relaxed matching criteria), respectively, on the held-out test set of the 2012 i2b2 temporal relation challenge. Discussion Experiments with feature sets reveal some interesting findings, such as: the verbal tense feature does not inform the anchor relation classification in clinical narratives as much as the tokens near the RI-TIMEX. Error analysis showed that underrepresented anchor point and anchor relation classes are difficult to detect. Conclusions We formulate the RI-TIMEX normalization problem as a pair of multi-label classification problems. Considering only RI-TIMEX extraction and normalization, the system achieves statistically significant improvement over the RI-TIMEX results of the best systems in the 2012 i2b2 challenge.
引用
收藏
页码:1001 / 1008
页数:8
相关论文
共 28 条
[1]   TOWARDS A GENERAL-THEORY OF ACTION AND TIME [J].
ALLEN, JF .
ARTIFICIAL INTELLIGENCE, 1984, 23 (02) :123-154
[2]  
[Anonymous], 2010, Proceedings of the 5th International Workshop on Semantic Evaluation
[3]  
[Anonymous], 2010, Proceedings of the 5th International Workshop on Semantic Evaluation
[4]  
Chang AX, 2012, LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3735
[5]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[6]   Classifying temporal relations in clinical data: A hybrid, knowledge-rich approach [J].
D'Souza, Jennifer ;
Ng, Vincent .
JOURNAL OF BIOMEDICAL INFORMATICS, 2013, 46 :S29-S39
[7]  
De Marneffe M.-C., 2006, Linguistics in the Netherlands, V6, P449, DOI 10.1.1.74.3875
[8]  
Derczynski L., 2010, Proceedings of the 5th International Workshop on Semantic Evaluation, P337
[9]   Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives [J].
Kovacevic, Aleksandar ;
Dehghan, Azad ;
Filannino, Michele ;
Keane, John A. ;
Nenadic, Goran .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (05) :859-866
[10]  
Lai Albert M, 2008, AMIA Annu Symp Proc, P374