Anonymizing and Sharing Medical Text Records

被引:46
作者
Li, Xiao-Bai [1 ]
Qin, Jialun [1 ]
机构
[1] Univ Massachusetts Lowell, Manning Sch Business, Dept Operat & Informat Syst, Lowell, MA 01854 USA
基金
美国国家卫生研究院;
关键词
privacy; information extraction; document clustering; anonymization; data analytics; PROTECTING PRIVACY; DE-IDENTIFICATION; DATABASES; INFORMATICS; DISCLOSURE;
D O I
10.1287/isre.2016.0676
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Health information technology has increased accessibility of health and medical data and benefited medical research and healthcare management. However, there are rising concerns about patient privacy in sharing medical and healthcare data. A large amount of these data are in free text form. Existing techniques for privacy-preserving data sharing deal largely with structured data. Current privacy approaches for medical text data focus on detection and removal of patient identifiers from the data, which may be inadequate for protecting privacy or preserving data quality. We propose a new systematic approach to extract, cluster, and anonymize medical text records. Our approach integrates methods developed in both data privacy and health informatics fields. The key novel elements of our approach include a recursive partitioning method to cluster medical text records based on the similarity of the health and medical information and a value-enumeration method to anonymize potentially identifying information in the text data. An experimental study is conducted using real-world medical documents. The results of the experiments demonstrate the effectiveness of the proposed approach.
引用
收藏
页码:332 / 352
页数:21
相关论文
共 44 条
[21]   Class-Restricted Clustering and Microperturbation for Data Privacy [J].
Li, Xiao-Bai ;
Sarkar, Sumit .
MANAGEMENT SCIENCE, 2013, 59 (04) :796-812
[22]   Protecting Privacy Against Record Linkage Disclosure: A Bounded Swapping Approach for Numeric Data [J].
Li, Xiao-Bai ;
Sarkar, Sumit .
INFORMATION SYSTEMS RESEARCH, 2011, 22 (04) :774-789
[23]  
Machanavajjhala A., 2006, P 22 INT C DAT ENG, P24, DOI DOI 10.1109/ICDE.2006.1
[24]   Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation [J].
Melville, Nigel ;
McQuaid, Michael .
INFORMATION SYSTEMS RESEARCH, 2012, 23 (02) :559-574
[25]   Maximizing accuracy of shared databases when concealing sensitive patterns [J].
Menon, S ;
Sarkar, S ;
Mukherjee, S .
INFORMATION SYSTEMS RESEARCH, 2005, 16 (03) :256-270
[26]  
Meystre S M, 2008, Yearb Med Inform, P128
[27]   Automatic de-identification of textual documents in the electronic health record: a review of recent research [J].
Meystre, Stephane M. ;
Friedlin, F. Jeffrey ;
South, Brett R. ;
Shen, Shuying ;
Samore, Matthew H. .
BMC MEDICAL RESEARCH METHODOLOGY, 2010, 10
[28]   Strategies for maintaining patient privacy in i2b2 [J].
Murphy, Shawn N. ;
Gainer, Vivian ;
Mendis, Michael ;
Churchill, Susanne ;
Kohane, Isaac .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2011, 18 :I103-I108
[29]   Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) [J].
Murphy, Shawn N. ;
Weber, Griffin ;
Mendis, Michael ;
Gainer, Vivian ;
Chueh, Henry C. ;
Churchill, Susanne ;
Kohane, Isaac .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (02) :124-130
[30]  
OCR, 2012, GUID REG METH DEID P