Large-scale evaluation of automated clinical note de-identification and its impact on information extraction
被引:51
作者:
Deleger, Louise
论文数: 0引用数: 0
h-index: 0
机构:
Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USACincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
Deleger, Louise
[1
]
Molnar, Katalin
论文数: 0引用数: 0
h-index: 0
机构:
Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USACincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
Molnar, Katalin
[1
]
Savova, Guergana
论文数: 0引用数: 0
h-index: 0
机构:
Childrens Hosp Boston Informat Program, Boston, MA USA
Harvard Univ, Sch Med, Boston, MA USACincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
Savova, Guergana
[2
,3
]
Xia, Fei
论文数: 0引用数: 0
h-index: 0
机构:
Univ Washington, Dept Linguist, Seattle, WA 98195 USACincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
Xia, Fei
[4
]
Lingren, Todd
论文数: 0引用数: 0
h-index: 0
机构:
Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USACincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
Lingren, Todd
[1
]
Li, Qi
论文数: 0引用数: 0
h-index: 0
机构:
Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USACincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
Li, Qi
[1
]
Marsolo, Keith
论文数: 0引用数: 0
h-index: 0
机构:
Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USACincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
Marsolo, Keith
[1
]
Jegga, Anil
论文数: 0引用数: 0
h-index: 0
机构:
Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USACincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
Jegga, Anil
[1
]
Kaiser, Megan
论文数: 0引用数: 0
h-index: 0
机构:
Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USACincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
Kaiser, Megan
[1
]
Stoutenborough, Laura
论文数: 0引用数: 0
h-index: 0
机构:
Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USACincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
Stoutenborough, Laura
[1
]
Solti, Imre
论文数: 0引用数: 0
h-index: 0
机构:
Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USACincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
Solti, Imre
[1
]
机构:
[1] Cincinnati Childrens Hosp Med Ctr, Div Biomed Informat, Cincinnati, OH 45229 USA
[2] Childrens Hosp Boston Informat Program, Boston, MA USA
[3] Harvard Univ, Sch Med, Boston, MA USA
[4] Univ Washington, Dept Linguist, Seattle, WA 98195 USA
Objective (1) To evaluate a state-of-the-art natural language processing (NLP)-based approach to automatically de-identify a large set of diverse clinical notes. (2) To measure the impact of de-identification on the performance of information extraction algorithms on the de-identified documents. Material and methods A cross-sectional study that included 3503 stratified, randomly selected clinical notes (over 22 note types) from five million documents produced at one of the largest US pediatric hospitals. Sensitivity, precision, F value of two automated de-identification systems for removing all 18 HIPAA-defined protected health information elements were computed. Performance was assessed against a manually generated 'gold standard'. Statistical significance was tested. The automated de-identification performance was also compared with that of two humans on a 10% subsample of the gold standard. The effect of de-identification on the performance of subsequent medication extraction was measured. Results The gold standard included 30 815 protected health information elements and more than one million tokens. The most accurate NLP method had 91.92% sensitivity (R) and 95.08% precision (P) overall. The performance of the system was indistinguishable from that of human annotators (annotators' performance was 92.15%(R)/93.95%(P) and 94.55%(R)/88.45%(P) overall while the best system obtained 92.91%(R)/95.73%(P) on same text). The impact of automated de-identification was minimal on the utility of the narrative notes for subsequent information extraction as measured by the sensitivity and precision of medication name extraction. Discussion and conclusion NLP-based de-identification shows excellent performance that rivals the performance of human annotators. Furthermore, unlike manual de-identification, the automated approach scales up to millions of documents quickly and inexpensively.
机构:
Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA
Department of Pathology, Harvard Medical School, Boston, MADepartment of Pathology, Beth Israel Deaconess Medical Center, Boston, MA
Beckwith B.A.
Mahaadevan R.
论文数: 0引用数: 0
h-index: 0
机构:
Department of Pathology, Harvard Medical School, Boston, MADepartment of Pathology, Beth Israel Deaconess Medical Center, Boston, MA
Mahaadevan R.
Balis U.J.
论文数: 0引用数: 0
h-index: 0
机构:
Department of Pathology, Harvard Medical School, Boston, MA
Department of Pathology, Massachusetts General Hospital, Boston, MADepartment of Pathology, Beth Israel Deaconess Medical Center, Boston, MA
Balis U.J.
Kuo F.
论文数: 0引用数: 0
h-index: 0
机构:
Department of Pathology, Harvard Medical School, Boston, MA
Department of Pathology, Brigham and Women's Hospital, Boston, MADepartment of Pathology, Beth Israel Deaconess Medical Center, Boston, MA
机构:
Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA
Department of Pathology, Harvard Medical School, Boston, MADepartment of Pathology, Beth Israel Deaconess Medical Center, Boston, MA
Beckwith B.A.
Mahaadevan R.
论文数: 0引用数: 0
h-index: 0
机构:
Department of Pathology, Harvard Medical School, Boston, MADepartment of Pathology, Beth Israel Deaconess Medical Center, Boston, MA
Mahaadevan R.
Balis U.J.
论文数: 0引用数: 0
h-index: 0
机构:
Department of Pathology, Harvard Medical School, Boston, MA
Department of Pathology, Massachusetts General Hospital, Boston, MADepartment of Pathology, Beth Israel Deaconess Medical Center, Boston, MA
Balis U.J.
Kuo F.
论文数: 0引用数: 0
h-index: 0
机构:
Department of Pathology, Harvard Medical School, Boston, MA
Department of Pathology, Brigham and Women's Hospital, Boston, MADepartment of Pathology, Beth Israel Deaconess Medical Center, Boston, MA