Using Natural Language Processing to Improve Efficiency of Manual Chart Abstraction in Research: The Case of Breast Cancer Recurrence

被引:111
作者
Carrell, David S. [1 ]
Halgrim, Scott [1 ]
Diem-Thy Tran [1 ]
Buist, Diana S. M. [1 ]
Chubak, Jessica [1 ,5 ]
Chapman, Wendy W. [2 ]
Savova, Guergana [3 ,4 ]
机构
[1] Grp Hlth Res Inst, Seattle, WA 98101 USA
[2] Univ Calif San Diego, Dept Med, Div Biomed Informat, San Diego, CA 92103 USA
[3] Boston Childrens Hosp, Informat Program, Boston, MA USA
[4] Harvard Univ, Dept Pediat, Harvard Med Sch, Boston, MA 02115 USA
[5] Univ Washington, Sch Publ Hlth, Dept Epidemiol, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
breast cancer recurrence; chart abstraction; natural language processing; ELECTRONIC MEDICAL-RECORDS; ADMINISTRATIVE DATA; DISCOVERY; IDENTIFICATION; ALGORITHMS; SUPPORT; SYSTEM; OLDER; RISK;
D O I
10.1093/aje/kwt441
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
The increasing availability of electronic health records (EHRs) creates opportunities for automated extraction of information from clinical text. We hypothesized that natural language processing (NLP) could substantially reduce the burden of manual abstraction in studies examining outcomes, like cancer recurrence, that are documented in unstructured clinical text, such as progress notes, radiology reports, and pathology reports. We developed an NLP-based system using open-source software to process electronic clinical notes from 1995 to 2012 for women with early-stage incident breast cancers to identify whether and when recurrences were diagnosed. We developed and evaluated the system using clinical notes from 1,472 patients receiving EHR-documented care in an integrated health care system in the Pacific Northwest. A separate study provided the patient-level reference standard for recurrence status and date. The NLP-based system correctly identified 92% of recurrences and estimated diagnosis dates within 30 days for 88% of these. Specificity was 96%. The NLP-based system overlooked 5 of 65 recurrences, 4 because electronic documents were unavailable. The NLP-based system identified 5 other recurrences incorrectly classified as nonrecurrent in the reference standard. If used in similar cohorts, NLP could reduce by 90% the number of EHR charts abstracted to identify confirmed breast cancer recurrence cases at a rate comparable to traditional abstraction.
引用
收藏
页码:749 / 758
页数:10
相关论文
共 43 条
  • [1] [Anonymous], PYTH DOC
  • [2] [Anonymous], AP CTAKES 3 0 COMP U
  • [3] [Anonymous], 2012, Pediatric Biomedical Informatics: Computer Applications in Pediatric Research
  • [4] [Anonymous], J DIGIT IMAGING
  • [5] [Anonymous], MICROSOFT SQL SERVER
  • [6] [Anonymous], PHARM RHEUM ARTHR TH
  • [7] [Anonymous], POTENTIAL CLAIMS DAT
  • [8] [Anonymous], NAT CANC I ENT VOC S
  • [9] Referral, Receipt, and Completion of Chemotherapy in Patients With Early-Stage Breast Cancer Older Than 65 Years and at High Risk of Breast Cancer Recurrence
    Buist, Diana S. M.
    Chubak, Jessica
    Prout, Marianne
    Yood, Marianne Ulcickas
    Bosco, Jaclyn L. F.
    Thwin, Soe Soe
    Gold, Heather Taffet
    Owusu, Cynthia
    Field, Terry S.
    Quinn, Virginia P.
    Wei, Feifei
    Silliman, Rebecca A.
    [J]. JOURNAL OF CLINICAL ONCOLOGY, 2009, 27 (27) : 4508 - 4514
  • [10] A comparison of classification algorithms to automatically identify chest X-ray reports that support pneumonia
    Chapman, WW
    Fizman, M
    Chapman, BE
    Huag, PJ
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (01) : 4 - 14