Identifying patient smoking status from medical discharge records

被引:203
作者
Uzuner, Oezlem [1 ,2 ]
Goldstein, Ira [1 ]
Luo, Yuan [1 ]
Kohane, Isaac [3 ,4 ]
机构
[1] SUNY Albany, Albany, NY 12222 USA
[2] MIT, Boston, MA USA
[3] Childrens Hosp, Boston, MA 02115 USA
[4] Harvard Univ, Sch Med, Boston, MA USA
关键词
D O I
10.1197/jamia.M2408
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The authors organized a Natural Language Processing (NLP) challenge on automatically determining the smoking status of patients from information found in their discharge records. This challenge was issued as a part of the i2b2 (Informatics for Integrating Biology to the Bedside) project, to survey, facilitate, and examine studies in medical language understanding for clinical narratives. This article describes the smoking challenge, details the data and the annotation process, explains the evaluation metrics, discusses the characteristics of the systems developed for the challenge, presents an analysis of the results of received system runs, draws conclusions about the state of the art, and identifies directions for future research. A total of 11 teams participated in the smoking challenge. Each team submitted up to three system runs, providing a total of 23 submissions. The submitted system runs were evaluated with microaveraged and macroaveraged precision, recall, and F-measure. The systems submitted to the smoking challenge represented a variety of machine learning and rule-based algorithms. Despite the differences in their approaches to smoking status identification, many of these systems provided good results. There were 12 system runs with microaveraged F-measures above 0.84. Analysis of the results highlighted the fact that discharge summaries express smoking status using a limited number of textual features (e.g., "smok", "tobac", "cigar", Social History, etc.). Many of the effective smoking status identifiers benefit from these features.
引用
收藏
页码:14 / 24
页数:11
相关论文
共 53 条
  • [1] Aramaki E, 2006, I2B2 WORKSH CHALL NA
  • [2] Detecting adverse events using information technology
    Bates, DW
    Evans, RS
    Murff, H
    Stetson, PD
    Pizziferri, L
    Hripcsak, G
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2003, 10 (02) : 115 - 128
  • [3] Cross-language evaluation forum: Objectives, results, achievements
    Braschler, M
    Peters, C
    [J]. INFORMATION RETRIEVAL, 2004, 7 (1-2): : 7 - 31
  • [4] CARRERO FM, 2006, I2B2 WORKSH CHALL NA
  • [5] Promises of text processing: natural language processing meets AI
    Chang, JT
    Altman, RB
    [J]. DRUG DISCOVERY TODAY, 2002, 7 (19) : 992 - 993
  • [6] Classification of emergency department chief complaints into 7 syndromes: A retrospective analysis of 527,228 patients
    Chapman, WW
    Dowling, JN
    Wagner, MM
    [J]. ANNALS OF EMERGENCY MEDICINE, 2005, 46 (05) : 445 - 455
  • [7] A simple algorithm for identifying negated findings and diseases in discharge summaries
    Chapman, WW
    Bridewell, W
    Hanbury, P
    Cooper, GF
    Buchanan, BG
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) : 301 - 310
  • [8] Classifying free-text triage chief complaints into syndromic categories with natural language processing
    Chapman, WW
    Christensen, LM
    Wagner, MM
    Haug, PJ
    Ivanov, O
    Dowling, JN
    Olszewski, RT
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 33 (01) : 31 - 40
  • [9] CHINCHOR N, 1992, 4 MESS UND C MUC 4 M
  • [10] Clinical classification and terminology: Some history and current observations
    Chute, CG
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2000, 7 (03) : 298 - 303