Secondary use of electronic health records for building cohort studies through top-down information extraction

被引:15
作者
Kreuzthaler, Markus [1 ]
Schulz, Stefan [1 ]
Berghold, Andrea [1 ]
机构
[1] Med Univ Graz, Inst Med Informat Stat & Documentat, Graz, Austria
关键词
Information extraction; Secondary use; Clinical narrative; CLINICAL TEXT; BIOBANKS; SYSTEM; I2B2;
D O I
10.1016/j.jbi.2014.10.010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Controlled clinical trials are usually supported with an in-front data aggregation system, which supports the storage of relevant information according to the trial context within a highly structured environment. In contrast to the documentation of clinical trials, daily routine documentation has many characteristics that influence data quality. One such characteristic is the use of non-standardized text, which is an indispensable part of information representation in clinical information systems. Based on a cohort study we highlight challenges for mining electronic health records targeting free text entry fields within semi-structured data sources. Our prototypical information extraction system achieved an F-measure of 0.91 (precision = 0.90, recall = 0.93) for the training set and an F-measure of 0.90 (precision = 0.89, recall = 0.92) for the test set. We analyze the obtained results in detail and highlight challenges and future directions for the secondary use of routine data in general. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:188 / 195
页数:8
相关论文
共 24 条
  • [1] Antolík J, 2005, ST HEAL T, V116, P817
  • [2] Botsis Taxiarchis, 2010, Summit Transl Bioinform, V2010, P1
  • [3] Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions
    Chapman, Wendy W.
    Nadkarni, Prakash M.
    Hirschman, Lynette
    D'Avolio, Leonard W.
    Savova, Guergana K.
    Uzuner, Ozlem
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (05) : 540 - 543
  • [4] A simple algorithm for identifying negated findings and diseases in discharge summaries
    Chapman, WW
    Bridewell, W
    Hanbury, P
    Cooper, GF
    Buchanan, BG
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) : 301 - 310
  • [5] Ciravegna F., 2001, PROC OF THE 17TH INT, V2, P1251
  • [6] Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics
    Cunningham, Hamish
    Tablan, Valentin
    Roberts, Angus
    Bontcheva, Kalina
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (02)
  • [7] Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010
    de Bruijn, Berry
    Cherry, Colin
    Kiritchenko, Svetlana
    Martin, Joel
    Zhu, Xiaodan
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 (05) : 557 - 562
  • [8] Eder J, 2009, LECT NOTES COMPUT SC, V5740, P156, DOI 10.1007/978-3-642-03722-1_7
  • [9] SAIL-a software system for sample and phenotype availability across biobanks and cohorts
    Gostev, Mikhail
    Fernandez-Banet, Julio
    Rung, Johan
    Dietrich, Joern
    Prokopenko, Inga
    Ripatti, Samuli
    McCarthy, Mark I.
    Brazma, Alvis
    Krestyaninova, Maria
    [J]. BIOINFORMATICS, 2011, 27 (04) : 589 - 591
  • [10] Longitudinal analysis of pain in patients with metastatic prostate cancer using natural language processing of medical record text
    Heintzelman, Norris H.
    Taylor, Robert J.
    Simonsen, Lone
    Lustig, Roger
    Anderko, Doug
    Haythornthwaite, Jennifer A.
    Childs, Lois C.
    Bova, George Steven
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (05) : 898 - 905