Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research

被引:100
作者
Gupta, D
Saul, M
Gilbertson, J
机构
[1] Univ Pittsburgh, Dept Pathol, Pittsburgh, PA 15232 USA
[2] Univ Pittsburgh, Med Ctr Presbyterian Shadyside, Ctr Pathol Informat, Pittsburgh, PA 15232 USA
[3] Univ Pittsburgh, Med Ctr Presbyterian Shadyside, Ctr Oncol Informat, Dept Pathol, Pittsburgh, PA 15232 USA
[4] Univ Pittsburgh, Sch Hlth Sci, Ctr Biomed Informat, Clin Res Informat Serv, Pittsburgh, PA USA
关键词
deidentification; health insurance portability and accountability act; HIPAA; safe-harbor elements; confidentiality; pathology reports;
D O I
10.1309/E6K33GBPE5C27FYU
中图分类号
R36 [病理学];
学科分类号
100104 ;
摘要
We evaluated a comprehensive deidentification engine at the University of Pittsburgh Medical Center (UPMC), Pittsburgh, PA, that uses a complex set of rules, dictionaries, pattern-matching algorithms, and the Unified Medical Language System to identify and replace identifying text in clinical reports while preserving medical information for sharing in research. In our initial data set of 967 surgical pathology reports, the software did not suppress outside (103), UPMC (47), and non-UPMC (56) accession numbers; dates (7); names (9) or initials (25) of case pathologists; or hospital or laboratory names (46). In 150 reports, some clinical information was suppressed inadvertently (overmarking). The engine retained eponymic patient names, e.g. Barrett and Gleason. In the second evaluation (1, 000 reports), the software did not suppress outside (90) or UPMC (6) accession numbers or names (4) or initials (2) of case pathologists. In the third evaluation, the software removed names of patients, hospitals (2971300), pathologists (2971300), transcriptionists, residents and physicians, dates of procedures, and accession numbers (2981300). By the end of the evaluation, the system was reliably and specifically removing safe-harbor identifiers and producing highly readable deidentified text without removing important clinical information. Collaboration between pathology domain experts and system developers and continuous quality assurance are needed to optimize ongoing deidentification processes.
引用
收藏
页码:176 / 186
页数:11
相关论文
共 13 条
  • [1] [Anonymous], 1991, FED REGISTER
  • [2] [Anonymous], 2002, FED REG 0814
  • [3] Multicenter patient records research: Security policies and tools
    Behlen, FM
    Johnson, SB
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1999, 6 (06) : 435 - 443
  • [4] Berman JJ, 2003, ARCH PATHOL LAB MED, V127, P680
  • [5] Edelstein Ludwig., 1943, HIPPOCRATIC OATH TEX
  • [6] Ferris TA, 2002, AMIA 2002 SYMPOSIUM, PROCEEDINGS, P245
  • [7] PRESIDENT KENNEDY AND ADDISONS DISEASE
    KURTZMAN, NA
    NICHOLS, J
    [J]. JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1967, 201 (13): : 1052 - &
  • [8] Malin B, 2001, J AM MED INFORM ASSN, P423
  • [9] Sweeney L, 1996, Proc AMIA Annu Fall Symp, P333
  • [10] Sweeney L, 1997, J AM MED INFORM ASSN, P51